llinked open data training for eu institutions
TRANSCRIPT
DATASUPPORT
OPENLinked Open DataPrinciples Technologies and Examples
PwC firms help organisations and individuals create the value theyrsquore looking for Wersquore a network of firms in 158 countries with close to 180000 people who are committed to
delivering quality in assurance tax and advisory services Tell us what matters to you and find out more by visiting us at wwwpwccom
PwC refers to the PwC network andor one or more of its member firms each of which is a separate legal entity Please see wwwpwccomstructure for further details
DATASUPPORTOPEN
Learning objectives
By the end of the course participants should have a clear understanding of
bull What linked open data is
bull What is the difference between linked and open data
bull How to publish linked data
bull The economic and social aspects of linked data
bull How linked data technologies can be applied to improve the
availability understandability and usability of EU data
Slide 2
DATASUPPORTOPEN
Content
This training consists of 3 modules
1 Introduction to linked data
2 Introduction to RDF amp SPARQL
3 Workshop on publishing open linked EU data
Slide 3
DATASUPPORTOPEN
Learning Module 1
Introduction to Linked Data
Slide 4
DATASUPPORTOPEN
Introduction to linked data
This module contains
bull An introduction to the linked data principles
bull The expected benefits of linked data
bull An introduction to linked data technologies
bull An outline of the 5-star scheme for publishing linked data
bull An overview of linked data initiatives in Europe
Slide 5
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
What is linked data
Evolution from a document-based Web to a Web of interlinked data
Slide 6
DATASUPPORTOPEN
The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo
bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL
bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines
bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)
Machine-readable data (or metadata) is data in a format that can be interpreted by a computer
2 types of machine-readable
data exist
bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa
bull data formats intended principally for computers eg RDF XML and JSON
Slide 7
See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml
httplinkeddatabookcomeditions10
DATASUPPORTOPEN
Defining linked dataProviding data as a service
ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo
EC ISA Case Study How Linked Data is transforming eGovernment
The four design principles of Linked Data (by Tim Berners Lee)
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that they can discover more things
Slide 8
See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q
httpwwww3orgDesignIssuesLinkedDatahtml
httpwwwyoutubecomwatchv=uju4wT9uBIA
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets
bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published
bull Ease of navigation makes browsing through complex data easier via URIs
bull Increase in data quality
The use of URIs leads to improved data management and quality
The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected
9
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Increase in data usability by providing data as a service
Resolvable URIs
Data is available in different formats not limited to RDF eg XML CSV text JSONhellip
bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either
Existing relationalspatial database systems by applying database-to-RDF conversions or
Existing XMLfile-based data
10
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Learning objectives
By the end of the course participants should have a clear understanding of
bull What linked open data is
bull What is the difference between linked and open data
bull How to publish linked data
bull The economic and social aspects of linked data
bull How linked data technologies can be applied to improve the
availability understandability and usability of EU data
Slide 2
DATASUPPORTOPEN
Content
This training consists of 3 modules
1 Introduction to linked data
2 Introduction to RDF amp SPARQL
3 Workshop on publishing open linked EU data
Slide 3
DATASUPPORTOPEN
Learning Module 1
Introduction to Linked Data
Slide 4
DATASUPPORTOPEN
Introduction to linked data
This module contains
bull An introduction to the linked data principles
bull The expected benefits of linked data
bull An introduction to linked data technologies
bull An outline of the 5-star scheme for publishing linked data
bull An overview of linked data initiatives in Europe
Slide 5
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
What is linked data
Evolution from a document-based Web to a Web of interlinked data
Slide 6
DATASUPPORTOPEN
The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo
bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL
bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines
bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)
Machine-readable data (or metadata) is data in a format that can be interpreted by a computer
2 types of machine-readable
data exist
bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa
bull data formats intended principally for computers eg RDF XML and JSON
Slide 7
See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml
httplinkeddatabookcomeditions10
DATASUPPORTOPEN
Defining linked dataProviding data as a service
ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo
EC ISA Case Study How Linked Data is transforming eGovernment
The four design principles of Linked Data (by Tim Berners Lee)
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that they can discover more things
Slide 8
See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q
httpwwww3orgDesignIssuesLinkedDatahtml
httpwwwyoutubecomwatchv=uju4wT9uBIA
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets
bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published
bull Ease of navigation makes browsing through complex data easier via URIs
bull Increase in data quality
The use of URIs leads to improved data management and quality
The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected
9
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Increase in data usability by providing data as a service
Resolvable URIs
Data is available in different formats not limited to RDF eg XML CSV text JSONhellip
bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either
Existing relationalspatial database systems by applying database-to-RDF conversions or
Existing XMLfile-based data
10
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Content
This training consists of 3 modules
1 Introduction to linked data
2 Introduction to RDF amp SPARQL
3 Workshop on publishing open linked EU data
Slide 3
DATASUPPORTOPEN
Learning Module 1
Introduction to Linked Data
Slide 4
DATASUPPORTOPEN
Introduction to linked data
This module contains
bull An introduction to the linked data principles
bull The expected benefits of linked data
bull An introduction to linked data technologies
bull An outline of the 5-star scheme for publishing linked data
bull An overview of linked data initiatives in Europe
Slide 5
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
What is linked data
Evolution from a document-based Web to a Web of interlinked data
Slide 6
DATASUPPORTOPEN
The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo
bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL
bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines
bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)
Machine-readable data (or metadata) is data in a format that can be interpreted by a computer
2 types of machine-readable
data exist
bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa
bull data formats intended principally for computers eg RDF XML and JSON
Slide 7
See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml
httplinkeddatabookcomeditions10
DATASUPPORTOPEN
Defining linked dataProviding data as a service
ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo
EC ISA Case Study How Linked Data is transforming eGovernment
The four design principles of Linked Data (by Tim Berners Lee)
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that they can discover more things
Slide 8
See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q
httpwwww3orgDesignIssuesLinkedDatahtml
httpwwwyoutubecomwatchv=uju4wT9uBIA
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets
bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published
bull Ease of navigation makes browsing through complex data easier via URIs
bull Increase in data quality
The use of URIs leads to improved data management and quality
The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected
9
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Increase in data usability by providing data as a service
Resolvable URIs
Data is available in different formats not limited to RDF eg XML CSV text JSONhellip
bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either
Existing relationalspatial database systems by applying database-to-RDF conversions or
Existing XMLfile-based data
10
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Learning Module 1
Introduction to Linked Data
Slide 4
DATASUPPORTOPEN
Introduction to linked data
This module contains
bull An introduction to the linked data principles
bull The expected benefits of linked data
bull An introduction to linked data technologies
bull An outline of the 5-star scheme for publishing linked data
bull An overview of linked data initiatives in Europe
Slide 5
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
What is linked data
Evolution from a document-based Web to a Web of interlinked data
Slide 6
DATASUPPORTOPEN
The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo
bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL
bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines
bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)
Machine-readable data (or metadata) is data in a format that can be interpreted by a computer
2 types of machine-readable
data exist
bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa
bull data formats intended principally for computers eg RDF XML and JSON
Slide 7
See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml
httplinkeddatabookcomeditions10
DATASUPPORTOPEN
Defining linked dataProviding data as a service
ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo
EC ISA Case Study How Linked Data is transforming eGovernment
The four design principles of Linked Data (by Tim Berners Lee)
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that they can discover more things
Slide 8
See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q
httpwwww3orgDesignIssuesLinkedDatahtml
httpwwwyoutubecomwatchv=uju4wT9uBIA
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets
bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published
bull Ease of navigation makes browsing through complex data easier via URIs
bull Increase in data quality
The use of URIs leads to improved data management and quality
The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected
9
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Increase in data usability by providing data as a service
Resolvable URIs
Data is available in different formats not limited to RDF eg XML CSV text JSONhellip
bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either
Existing relationalspatial database systems by applying database-to-RDF conversions or
Existing XMLfile-based data
10
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Introduction to linked data
This module contains
bull An introduction to the linked data principles
bull The expected benefits of linked data
bull An introduction to linked data technologies
bull An outline of the 5-star scheme for publishing linked data
bull An overview of linked data initiatives in Europe
Slide 5
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
What is linked data
Evolution from a document-based Web to a Web of interlinked data
Slide 6
DATASUPPORTOPEN
The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo
bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL
bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines
bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)
Machine-readable data (or metadata) is data in a format that can be interpreted by a computer
2 types of machine-readable
data exist
bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa
bull data formats intended principally for computers eg RDF XML and JSON
Slide 7
See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml
httplinkeddatabookcomeditions10
DATASUPPORTOPEN
Defining linked dataProviding data as a service
ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo
EC ISA Case Study How Linked Data is transforming eGovernment
The four design principles of Linked Data (by Tim Berners Lee)
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that they can discover more things
Slide 8
See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q
httpwwww3orgDesignIssuesLinkedDatahtml
httpwwwyoutubecomwatchv=uju4wT9uBIA
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets
bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published
bull Ease of navigation makes browsing through complex data easier via URIs
bull Increase in data quality
The use of URIs leads to improved data management and quality
The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected
9
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Increase in data usability by providing data as a service
Resolvable URIs
Data is available in different formats not limited to RDF eg XML CSV text JSONhellip
bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either
Existing relationalspatial database systems by applying database-to-RDF conversions or
Existing XMLfile-based data
10
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
What is linked data
Evolution from a document-based Web to a Web of interlinked data
Slide 6
DATASUPPORTOPEN
The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo
bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL
bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines
bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)
Machine-readable data (or metadata) is data in a format that can be interpreted by a computer
2 types of machine-readable
data exist
bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa
bull data formats intended principally for computers eg RDF XML and JSON
Slide 7
See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml
httplinkeddatabookcomeditions10
DATASUPPORTOPEN
Defining linked dataProviding data as a service
ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo
EC ISA Case Study How Linked Data is transforming eGovernment
The four design principles of Linked Data (by Tim Berners Lee)
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that they can discover more things
Slide 8
See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q
httpwwww3orgDesignIssuesLinkedDatahtml
httpwwwyoutubecomwatchv=uju4wT9uBIA
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets
bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published
bull Ease of navigation makes browsing through complex data easier via URIs
bull Increase in data quality
The use of URIs leads to improved data management and quality
The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected
9
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Increase in data usability by providing data as a service
Resolvable URIs
Data is available in different formats not limited to RDF eg XML CSV text JSONhellip
bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either
Existing relationalspatial database systems by applying database-to-RDF conversions or
Existing XMLfile-based data
10
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
The Web is evolving from a ldquoWeb of linked documentsrdquo into a ldquoWeb of linked datardquo
bull The Web started as a collection of documents published online ndash accessible at a Web location identified by a URL
bull These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines
bull The Web of Data is about enabling the access to this data by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs) thus enabling people and machines to collect the data and put it together to do all kinds of things with it (permitted by the licence)
Machine-readable data (or metadata) is data in a format that can be interpreted by a computer
2 types of machine-readable
data exist
bull human-readable data that is marked up so that it can also be understood by computers eg microformats RDFa
bull data formats intended principally for computers eg RDF XML and JSON
Slide 7
See alsohttpwwwtedcomtalkstim_berners_lee_on_the_next_webhtml
httplinkeddatabookcomeditions10
DATASUPPORTOPEN
Defining linked dataProviding data as a service
ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo
EC ISA Case Study How Linked Data is transforming eGovernment
The four design principles of Linked Data (by Tim Berners Lee)
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that they can discover more things
Slide 8
See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q
httpwwww3orgDesignIssuesLinkedDatahtml
httpwwwyoutubecomwatchv=uju4wT9uBIA
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets
bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published
bull Ease of navigation makes browsing through complex data easier via URIs
bull Increase in data quality
The use of URIs leads to improved data management and quality
The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected
9
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Increase in data usability by providing data as a service
Resolvable URIs
Data is available in different formats not limited to RDF eg XML CSV text JSONhellip
bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either
Existing relationalspatial database systems by applying database-to-RDF conversions or
Existing XMLfile-based data
10
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Defining linked dataProviding data as a service
ldquoLinked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations business and citizensrdquo
EC ISA Case Study How Linked Data is transforming eGovernment
The four design principles of Linked Data (by Tim Berners Lee)
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that they can discover more things
Slide 8
See alsohttpwwwyoutubecomwatchv=4x_xzT5eF5Q
httpwwww3orgDesignIssuesLinkedDatahtml
httpwwwyoutubecomwatchv=uju4wT9uBIA
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets
bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published
bull Ease of navigation makes browsing through complex data easier via URIs
bull Increase in data quality
The use of URIs leads to improved data management and quality
The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected
9
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Increase in data usability by providing data as a service
Resolvable URIs
Data is available in different formats not limited to RDF eg XML CSV text JSONhellip
bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either
Existing relationalspatial database systems by applying database-to-RDF conversions or
Existing XMLfile-based data
10
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Flexible data integration facilitates data integration and enables the interconnection of previously disparate government datasets
bull Efficiency gains in data integrationndash the network effect the addition of each new dataset increases the value of those datasets that are already published
bull Ease of navigation makes browsing through complex data easier via URIs
bull Increase in data quality
The use of URIs leads to improved data management and quality
The increased (re)use triggers a growing demand to improve data quality Through crowd-sourcing and self-service mechanisms errors are progressively corrected
9
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Increase in data usability by providing data as a service
Resolvable URIs
Data is available in different formats not limited to RDF eg XML CSV text JSONhellip
bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either
Existing relationalspatial database systems by applying database-to-RDF conversions or
Existing XMLfile-based data
10
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Increase in data usability by providing data as a service
Resolvable URIs
Data is available in different formats not limited to RDF eg XML CSV text JSONhellip
bull Compatible with existing standards and technologies a linked data infrastructure can provide access to homogenised linked and enriched data using standard Web-based interfaces (such as HTTP and SPARQL) and Web-based languages (such as XHTML RDF+XML) on top of either
Existing relationalspatial database systems by applying database-to-RDF conversions or
Existing XMLfile-based data
10
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
The value proposition of linked (open) government data
bull Ease of model updates RDF data models and vocabularies can be extended adapted and updated more easily Changes can be reflected on the data with lower costs and effort (compared to traditional relational databases)
bull Cost reduction The reuse of LOGD in e-Government applications leads to considerable cost reductions when it comes to service integration data use reuse and exchange
bull New services The availability of LOGD gives rise to new integrated services offered by the public andor private sector
11
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-
business-models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
The four principles of linked data in practice
1 Use Uniform Resource Identifiers (URIs) as names for things
2 Use HTTP URIs so that people can look up those names
Eg for an organisation UNICEF in EuroVoc
- httpeurovoceuropaeu1022
Slide 12
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
The four principles in practice
3 When someone looks up a URI provide useful information using the standards (RDF SPARQL)
4 Include links to other URIs so that peoplemachines can discover more things
Slide 13
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Linked data vs open data
Open data
Data can be published and bepublicly available under an openlicence without linking to otherdata sources
Linked data
Data can be linked to URIs from other data sources using open standards such as RDF without being publicly available under an open licence
Slide 14
ldquoOpen data is data that can be freely used reused and redistributed by anyone ndash subject only at most to the requirement to attribute and share-alikerdquo- OpenDefinitionorg
See alsoCobden et al A research agenda for Linked Closed Data
httpceur-wsorgVol-782CobdenEtAl_COLD2011pdf
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Linked data foundations
URIs for naming things RDF for describing data and SPARQL for querying linked data
Slide 15
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Uniform Resource Identifier (URI)
ldquoA Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resourcerdquo
ndash ISArsquos 10 Rules for Persistent URIs
A country eg Belgium
- httppublicationseuropaeuresourceauthoritycountryBEL
An organisation eg the Publications Office
- httppublicationseuropaeuresourceauthoritycorporate-bodyPUBL
A dataset eg Countries Named Authority List
- httppublicationseuropaeuresourceauthoritycountry
Slide 16
BE
See alsohttpwwwslidesharenetOpenDataSupportdesign
-and-manage-persitent-uris
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
RDF amp SPARQL
The Resource Description Framework (RDF ) is a syntax for representing data and resources on the Web
Slide 17
RDF breaks every piece of information down in triples
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
SPARQL is a standardised language for querying RDF data
httpexampleorgplaceBrussels is the capital of ldquoBelgiumrdquoOR
httpexampleorgplaceBrussels is the capital of httpexampleorgplaceBelgium
Subject Predicate Object
See alsohttpwwwslidesharenetOpenDataSupportintroduction-to-rdf-sparql
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
How to publish linked data
Paving the way towards 5-star linked data
Slide 18
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
5 star-schema of Linked Open Data
Make your stuff available on the Web (whatever format) under an open license
Make it available as structured data (eg Excel instead of image scan of a table)
Use non-proprietary formats (eg CSV instead of Excel)
Use URIs to denote things so that people can point at your stuff
Link your data to other data to provide context
Slide 19
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Make your stuff available on the Web under an open licence
Slide 20
Trends risks and
vulnerabilities in
securities markets
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Make it available as structured data
Slide 21
Waterbase - Emissions to water
CountryCode
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Use non-proprietary formats
bull Proprietary Excel Word PDF
bull Non-proprietary XML CSV RDF JSON ODF
DG Enlargement - Regional programmes
Slide 22
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Use URIs to denote things
Slide 23
See alsohttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
Food Additives - httpopen-dataeuropaeuendatadataset1gXgb0Yj73R4ttDChQ5Wyg
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Link your data to other data to provide context
Slide 24
Corporate bodies Named Authority Lists - httpopen-dataeuropaeuendatadatasetcorporate-body
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
LOGD roadblocks
bull Necessary investments
bull Lack of necessary competencies
bull Perceived lack of tools
bull Lack of service level guarantees
bull Missing restrictive or incompatible licences
bull Surfeit of standard vocabularies
bull The inertia of the status quo ndash change is accomplished slowly
25
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Linked data initiatives in Europe
Examples on supra-national national regional and private initiatives in the area of linked data
Slide 26
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
EU institutions initiatives ndash some examples
bull European Environment Agency SPARQL endpointTool allowing searching for linked data published by the the European Environment Agency httpsemanticeeaeuropaeusparql
bull EU Open Data Portal SPARQL endpointAllows searching for linked metadata of datasets published on the EU Open Data Portal httpsopen-dataeuropaeuenlinked-data
bull Publications Office of the EU - CELLAR SPARQL Endpoint Allows searching for linked data published by the Publications Office of the EU such as legislation data publications data etc httppublicationseuropaeuenlinked-data
bull DG SANTE SPARQL endpointTool for querying linked open data on European Community Health Indicators the EU Register of Health Claims etc httpeceuropaeusemantic_webgate
bull Europeana SPARQL endpointTool allowing querying a multi-lingual online collection of millions of digitized items from European museums libraries archives and multi-media collectionshttplabseuropeanaeuapilinked-open-data-SPARQL-endpoint
Slide 27
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Initiatives funded by the European Commission
Slide 28
ADMS
SWCORE
VOCABULARY
PUBLICSERVICE
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Member State initiatives ndash some examples
DE ndash Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria Berlin and Brandenburg
IT ndash Agenzia per lrsquoItalia digitiale
Three datasets published as linked data the Index of Public Administration the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration
NL ndash Building and address register
The Dutch Address and Buildings base register published as linked data
UK ndash Ordnance Survey
Three OS Open Data products published as linked data the 150 000 Scale Gazetteer Code-Point Open and the administrative geography taken from Boundary Line
UK ndash Companies House
Publishing basic company details as linked data using a simple URI for each company in their database
Slide 29
See alsoISA Study on Business Models for LOGD httpsjoinupeceuropaeucommunitysemicdocumentstudy-business-
models-linked-open-government-data-bm4logd
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 30
Three considerations for legislation as databull Typographic layoutbull Versioning changes over timebull Semantics
Semantic representation using RDF and Linked Databull URIs for things amp RDF data model
Requires granular URIs to name thingsbull Identifier httpwwwlegislationgovukidtypeyearnumbersectionnumberbull Representation data[xml | xht | pdf | rdf | feed]
Source httpswwwnationalarchivesgovukdocumentsinformation-managementopen-and-linked-data-johnsheridanppt
See alsoThe European Legal Identifier (ELI)httpeur-lexeuropaeulegal-contentENTXTuri=URISERVjl0068
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Member States Initiatives ndash UK National Archives
Slide 31
Versioning of legislation in RDF
httpwwwlegislationgovukidukpga201032section124datardf
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Open amp linked data at BBC
bull BBC Things the open data website of BBC allows anyone to access the data
that BBC stores about data on the places people and organisations that appear
in BBC programmes and online content
bull This data already powers large parts of the BBC website including BBC News and
Sport
bull BBC Things is part of the BBC Linked Data Platform which provides public
access to data stored in the BBC platform and provides a public reference for all of
the things that the BBC creates content about
Slide 32
Further reading
httpwwwbbccoukthingssearchq=juncker
httpwwwforbescomsitesbernardmarr20150601how-big-data-drives-success-at-rolls-royce
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN Slide 33
Open amp linked data at BBC
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Data Value Chains using Linked Data at Volkswagen
Slide 34
Source Soumlren Auer SEMIC Conference 2015 Rigahttpsjoinupeceuropaeusitesdefaultfilesisa_field_pathpresentation_by_soren_auer_-_creating_data_value_chains_by_linking_enterprise_datapdf
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
1 Link databases
ldquoIrsquod like to know last monthrsquos production volume total for all fields in which GDF Suez EampP Norge is a licenseerdquo
bull Data spread across multiple databases GDF Suez data + Norwegian Government data + Wikipedia data
bull Need to uniquely identify resources
2 Add meaning
ldquoIrsquod like to know all fields operated by Statoil Petroleum AS with their production volumesrdquo
bull Need for adding semantics in order to allow machine reasoning
For example
bull Kristin is a field
bull Aringsgard is an oil platform
bull Statoil Petroleum AS is a company
Linked Data in the oil and gas industry
Slide 35
Further reading httpwwwtopquadrantcom
resourcessolutionsdocsSe
mantic-data-oil-and-gaspdf
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Conclusions
bull Linked data is a set of design principles for sharing machine-readable data on the Web
bull URIs RDF and SPARQL form the foundational layer for Linked data
bull Linked data offers a number of advantages such as
o Data integration with small impact on legacy systems
o Enables for semantic interoperability
o Easier browsing through complex data
o Increased data quality
Slide 36
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Conclusions contrsquod
bull Linked data offers a number of advantages such as
o Enables easy updates adaptations and extensions of data models
o Cost reduction from the reuse of LOGD in e-Government applications
o Enables creativity and innovation through context and knowledge-
creation
Slide 37
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Learning Module 2
Introduction to RDF amp SPARQL
Slide 38
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Introduction to RDF and SPARQL
This module contains
bull An introduction to the Resource Description Framework (RDF) for describing your data
bull An introduction to SPARQL on how you can query and manipulate data in RDF
Slide 39
Find more on trainingopendatasupporteu
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have a clear understanding of
bull The Resource Description Framework (RDF)
bull How to writeread RDF
bull How you can describe your data with RDF
bull What SPARQL is
bull How to understand and write a SPARQL SELECT query
Slide 40
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Resource Description Framework
An introduction to RDF
Slide 41
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
RDF in the stack of Semantic Web technologies
Resource Everything that can have a unique identifier (URI) eg pages places people organisations products
Description attributes features and relations of the resources
Framework model languages and syntaxes for these descriptions
bull Published as a W3C recommendation in 1999
bull RDF was originally introduced as a data model for metadata
bull RDF was generalised to cover knowledge of all kinds
Slide 42
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Example RDF description of an organisation
Publications Office 2 rue Mercier 2985 Luxembourg LUXEMBOURG
Slide 43
ltrdfRDFxmlnsrdfs=ldquohttpwwww3org200001rdf-schemardquoxmlnsorg=ldquohttpwwww3orgnsorgrdquoxmlnslocn=ldquohttpwwww3orgnslocnrdquo gt
ltorgOrganization rdfabout=ldquohttppublicationseuropaeuresourceauthoritycorporate-bodyPUBLrdquogt
ltrdfslabelgt ldquoPublications Officerdquolt rdfslabelgtltorghasSite rdfresource=ldquohttpexamplecomsite1234rdquogt
ltorgOrganizationgt
ltlocnAddress rdfabout=ldquohttpexamplecomsite1234rdquogtltlocnfullAddressgtrdquo2 rue Mercier 2985 Luxembourg LUXEMBOURGrdquoltlocnfullAddressgt
ltlocnAddressgt
ltrdfRDFgt
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
RDF structure
Triples graphs and syntaxes
Slide 44
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
What is a triple
Slide 45
Every piece of information expressed in RDF is represented as a triple
bull Subject ndash a resource which is identified with a URI
bull Predicate ndash a URI-identified reused specification of the relationship
bull Object ndash a resource or literal to which the subject is related
httppublicationseuropaeuresourceauthorityfile-type has a title ldquoFile types Name Authority Listrdquo
Subject Predicate Object
Example name of a dataset
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
RDF SyntaxRDFXML
Slide 46
ltrdfRDF
xmlnsdcat=ldquohttpwwww3orgTRvocab-dcatldquoxmlnsdct=ldquohttppurlorgdctermsrdquo
ltdcatDataset rdfabout=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquogtltdcttitlegt ldquoFile types Named Authority Listrdquolt dcttitlegtltdctpublisher rdfresource=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogt
ltdcatDatasetgt
ltdctAgent rdfabout=ldquohttpopen-dataeuropaeuendatapublisherpublrdquogtltdcttitlegtrdquoPublications Officerdquoltdcttitlegt
ltdctPublishergt
ltrdfRDFgt
Subject
Predicate
Object
Gra
ph
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Visual representation (RDF graph) of the triples from the RDFXML syntax example
Slide 47
Subject
Predicate
Object
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
RDF SyntaxTurtle
Subject
Predicate
Object
Slide 48
prefix dcat lthttpwwww3orgTRvocab-dcatgt prefix dct lthttppurlorgdcterms
lt httppublicationseuropaeuresourceauthorityfile-typegt a ltdcatDatasetgt dcttitle ldquoFile types Name Authority Listldquodctpublisher lthttpopen-dataeuropaeuendatapublisherpublgt
lthttpopen-dataeuropaeuendatapublisherpublgta ltdctAgentgt dcttitle ldquoPublications Officerdquo
Gra
ph
See alsohttpwwww3org200912rdf-wspapersws11
Definition of prefixes
Description of data ndash triples
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
RDF SyntaxRDFa
Subject
Predicate
Object
Slide 49
lthtmlgt ltheadgt ltheadgt ltbodygt ltdiv resource=ldquohttppublicationseuropaeuresourceauthorityfile-typerdquotypeof= ldquohttpwwww3orgnsdcatDatasetrdquogtltpgt ltspan property= httppurlorgdctermstitle gtFile types Name Authority Listltspangt Publisher ltspan property=httppurlorgdctermsAgentgt Publications Officeltspangtltpgtltdivgtltbodygt
See alsohttpwwww3orgTR2012NOTE-rdfa-primer-20120607
embedding RDF data in HTML
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
How to represent data in RDF
Classes properties and vocabularies
Slide 50
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
RDF Vocabulary
ldquoA vocabulary is a data model comprising classes properties and relationships which can be used for describing your data and metadatardquo
bull Class A construct that represents things in the real andor information world eg a person an organisation a concept such as ldquohealthrdquo or ldquofreedomrdquo
bull Property A characteristic of a class in a particular dimension such as the legal name of an organisation or the date and time that an observation was made In RDF properties are encoded as data type properties
bull Relationship A link between two classes for example the link between a document and the organisation that published it (ie organisation publishes document) or the link between a map and the geographic region it depicts (ie map depicts geographic region) In RDF relationships are encoded as object type properties
Slide 51
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Examples of classes relationships and properties The Core Person Vocabulary in UML
52
class Healthcare Domain
Core VocabulariesIdentifier
dateOfIssue dateTime [01]
identifier string [11]
identifierType string [01]
issuingAuthority string [01]
issuingAuthorityUri URI [01]
Core VocabulariesPerson
alternativeName string
birthName string
dateOfBirth dateTime
dateOfDeath dateTime
familyName string
fullName string
gender code
givenName string
patronymicName string
Core VocabulariesLocation
geographicIdentifier URI
geographicName string
Core VocabulariesAddress
addressArea string
addressID string
adminUnitL1 string
adminUnitL2 string
fullAddress string
locatorDesignator string
locatorName string
poBox string
postCode string
postName string
thoroughfare string
Core VocabulariesGeometry
lat string
long string
wkt string
xmlGeometry XML
address
identifies
geometry
placeOfDeath
countryOfDeath
placeOfBirth
countryOfBirth
identifier
UML The Unified Modelling Language class diagrams provide the means for expressing the conceptual data model of vocabularies such as the ISA Core Vocabularies thus facilitating the understanding of the meaning of the data model
Relationships ClassProperties
Class
Class
Class
Class
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Introduction to SPARQL
The RDF Query Language
Slide 53
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
About SPARQL
SPARQL is the standard language to query graph data represented as RDF triples
bull SPARQL Protocol and RDF Query Language
bull One of the three core standards of the Semantic Web along with RDF and OWL
bull Became a W3C standard January 2008
bull SPARQL 11 is a W3C Recommendation since March 2013
Slide 54
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Types of SPARQL queries
bull SELECT Return a table of all X Y etc satisfying the following conditions
bull CONSTRUCT Find all X Y etc satisfying the following conditions and substitute them into the following template in order to generate (possibly new) RDF statements creating a new graph
bull DESCRIBE Find all statements in the dataset that provide information about the following resource(s) (identified by name or description)
bull INSERT Add triples to the RDF graph
bull DELETE Delete triples from the RDF graph
bull ASK Are there any X Y etc satisfying the following conditions
Slide 55
See alsohttpwwweuclid-projecteumoduleschapter2
httpsjoinupeceuropaeucommunityodsdocumenttm13-introduction-rdf-sparql-en
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
PREFIX dct lthttppurlorgdctermsgtPREFIX dcat lthttpwwww3orgTRvocab-dcatgt
SELECT titleWHERE
dataset rdftype dcatDataset dataset rdftitle title
Structure of a SPARQL Query
Slide 56
Type of
query Variables ie what to search for
RDF triple patterns ie
the conditions that
have to be met
Definition of
prefixes
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
SELECT ndash return the name of a dataset with particular URI
Slide 57
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset
WHERE
lthttpauthorityfile-typegt dcttitle dataset
dataset
ldquoFile types Name Authority Listrdquo
Sample data
Query
Result
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
SELECT - return the name and publisher of a dataset
Slide 58
PREFIX dcat lthttpwwww3orgTRvocab-dcatgtPREFIX dct lthttppurlorgdctermsgt
SELECT dataset publisher
WHEREhttpauthorityfile-type dctpublisher publisherURI
httpauthorityfile-type dcttitle datasetpublisherURI dcttitle publisher
dataset publisher
ldquoFile types Name Authority Listrdquo ldquoPublications Officerdquo
lthttpauthorityfile-typegt rdftype dcatDatasetlthttpauthorityfile-typegt dcttitle ldquoFile types Name Authority Listldquo lthttpauthorityfile-typegt dctpublisher lt httpopen-dataeuropaeuendatapublisherpublgt
lt httppublisherpublgt rdftype dctAgent lt httppublisherpublgt dcttitle ldquoPublications Officerdquo
Sample data
Query
Result
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (1)
Slide 59
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 60
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
SPARQL Example ndash EU ODP (2)
Slide 61
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Summary
bull RDF is a general way to express data intended for publishing on the Web
bull RDF data is expressed in triples subject predicate object
bull Different syntaxes exist for expressing data in RDF
bull SPARQL is a standardised language to query graph data expressed as RDF
bull SPARQL can be used to query and update RDF data
Slide 62
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN Slide 63
Learning Module 3
Workshop for Publishing Open
Linked EU Data
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Workshop for publishing open linked EU data
This module is about
bull Creating an RDF vocabulary for modelling your data
How to reuse existing vocabularies to model your data
How to create new classes and properties in RDF
How and where to publish your RDF vocabulary so that it can be reused by others
bull An example of how tabular data can be published as Linked Open Data using Open Refine
Slide 64
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Learning objectives
By the end of this training module you should have an understanding of
bull What the best practices are for creating an RDF vocabulary for modelling your data
bull Where to find RDF vocabularies for reuse
bull How you can create your own RDF vocabulary
bull How to publish your RDF vocabulary
bull The process and methodology for developing semantic agreements developed by the ISA Programme of the European Commission
Slide 65
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Creating an RDF vocabulary
How to reuse other vocabularies define your own terms publish and promote your vocabulary
Slide 66
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
6 steps for creating an RDF vocabulary
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties
Where new terms are required create them following commonly agreed best practice
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Slide 67
1
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
2
3
4
5
6
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Start with a robust Domain Model
Slide 68
1
hasCeiling
hasPoliticalcategory
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeading
EU Programme
CodeType
Political category
CodeDescription
Corporate body
CodeTypeLocation
IntroductionRemarkConditions
AcronymLegal base periodLegal base typeLegal base status
hasCorporate body
has
Nomenclature
has
EU Programme
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
General purpose vocabularies DCMI RDFS
To name things rdfslabel foafname skosprefLabel
To describe people FOAF vCard Core Person Vocabulary
To describe projects DOAP ADMSSW
To describe interoperability assets ADMS
To describe registered organisations Registered Organisation Vocabulary
To describe addresses vCard Core Location Vocabulary
To describe public services Core Public Service Vocabulary
To describe datasets DCAT DCAT Application Profile VoID
Reuse existing terms and vocabularies
Slide 69
2
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Well-known vocabularies
Slide 70
DCAT-AP Vocabulary for describing datasets in Europe
Core Person VocabularyVocabulary to describe the fundamental characteristics of a person eg the name the gender the date of birth
DOAP Vocabulary for describing projects
ADMS Vocabulary for describing interoperability assets
Dublin Core Defines general metadata attributes
Registered Organisation VocabularyVocabulary for describing organizations typically in a national or regional register
Organization Ontology for describing the structure of organizations
Core Location VocabularyVocabulary capturing the fundamental characteristics of a location
Core Public Service VocabularyVocabulary capturing the fundamental characteristics of a service offered by public administration
schemaorgAgreed vocabularies for publishing structured data on the Web elaborated by Google Yahoo and Microsoft
See alsohttpwwww3orgwikiTaskForcesCommunityProj
ectsLinkingOpenDataCommonVocabulariesReuse existing terms and vocabularies
2
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
bull Reuse greatly aids interoperability of your data
Use of dctermscreated for example the value for which should be a data typed date such as 2013-02-21^^xsddate is immediately processable by many machines If your schema encourages data publishers to use a different term and date format such as exdate 21 February 2013 ndash data published using your schema will require further processing to make it the same as everyone elses
bull Reuse adds credibility to your schema
It shows it has been published with care and professionalism again this promotes its reuse
bull Reuse is easier and cheaper
Reusing classes and properties from well defined and properly hosted vocabularies avoids your having to replicate that effort
Slide 71
Advantages of reuse
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
You can find reusable RDF vocabularies on
Slide 72
httpjoinupeceuropaeu httplovokfnorg
Reuse existing terms and vocabularies2
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
bull RDF schemas and vocabularies often include terms that are very generic
bull By creating sub-class and sub-property relationships systems that understand the super property or super class may be able to interpret the data even if the more specific terms are unknown
bull Do not create sub-classes and sub-properties simply to allow you to use your own term for something that already exists
Slide 73
3
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 74
3
The EU Budget vocabulary defines the introduction property as a sub-property of dctdescription
Nomenclature
TypeHeadingIntroductionRemarkConditions
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Creation of sub-classes and sub-properties
Slide 75
3
The EU Budget vocabulary defines the has nomenclature property as a sub-property of dctsubject
Amount
CurrencyFigureTypeYear
Nomenclature
TypeHeadingIntroductionRemarkConditions
has
Nomenclature
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
Classes begin with a capital letter and are always singular eg skosConcept
Properties begin with a lower case letter eg rdfslabel
Object properties should be verbs eg orghasSite
Data type properties should be nouns eg dctermsdescription
Use camel case if a term has more than one word eg foafisPrimaryTopicOf
Slide 76
4
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoAmountrdquo class
Slide 77
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
If there is no suitable authoritative reusable vocabulary for describing your data use conventions for describing your own vocabulary
- RDF Schema (RDFS)
- Web Ontology Language (OWL)
Example defining the ldquoamount typerdquo property
Slide 78
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
Amount
CurrencyFigureTypeYear
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Where new terms are required create them following commonly agreed best practices
When defining new properties consider to define their domain and range
A range states that the values of a property are instances of one or more classes
A domain states on which classes a given property can be used
Slide 79
4
See alsohttpwwwslidesharenetOpenDataSupportmodel-your-
data-metadata
hasCeiling
Amount
CurrencyFigureTypeYear
Political category
CodeDescription
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Publish within a highly stable environment designed to be persistent
bull Choose a stable namespace for your RDF vocabulary
Example httpdataeuropaeubud
bull Use good practices on the publication of persistent Uniform Resource Identifiers (URI) sets both in terms of format design rules and management
Examples
o httpwwww3orgnsadms
o httppurlorgdcelements11
Slide 80
5
See alsohttpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemashttpwwwslidesharenetOpenDataSupportdesign-and-manage-persitent-uris
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Publicise the RDF vocabulary by registering it with relevant services
Once your RDF vocabulary is published you will want people to know about it To reach a wider audience register it on Joinup and Linked Open Vocabularies
Slide 81
6
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Conclusions
Slide 82
Start with a robust Domain Model developed following a structured process and methodology
Research existing terms and their usage and maximise reuse of those terms
Where new terms can be seen as specialisations of existing terms create sub class and sub properties as appropriate
Where new terms are required create them following commonly agreed best practice in terms of naming conventions etc
Publish within a highly stable environment designed to be persistent
Publicise the RDF vocabulary by registering it with relevant services
Analyse
Model
Publish
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Example
Using Open Refine for RDF to publish tabular data as Linked Data
Slide 83
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
What is Open Refine
Slide 84
ldquoOpenRefine is a powerful tool for working with messy data cleaning it transforming it from one format into another rdquo- openRefineorg
See alsoOpen Refine website
httpopenrefineorg
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
What is Open Refine RDF extension
Open Refine RDF extension allows you to easily import data in different formats such as
CSV
Excel(xls and xlsx)
JSON
XML and
RDFXML
And then determine the intended structure of an RDF dataset by drawing a template graph
Slide 85
See alsoLOD 2 Webinar ndash Open Refine httpwwwyoutubecomwatchv=4Ve93C238gI
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Using Open Refine to model and publish open data Getting started
1 Install Open Refine from httpsgithubcomOpenRefine
2 Install the RDF extension httprefinederiie
And then
Describe your data in a spreadsheet
Create a project and upload it in Open Refine
Clean up the data
Map your data to appropriate RDF classes amp properties
Export the data in RDF
Slide 86
1
2
3
4
5
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Example situationPublish statistical data as RDF according to RDF Data Cube Vocabulary
Digital Agenda Scoreboard
Slide 87
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Describe your data in a spreadsheet
Download the tabular data
Slide 88
1
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Create a project and upload it in Open Refine
Slide 89
2
Upload the spreadsheet
Select relevant tabs
Create the project
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Clean up the data ndash table harmonisation
Slide 90
3
bull Star amp remove unnessary rows
bull Rename columns
bull Use facets to select the data to be published
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Clean up the data ndash prepare RDF
Slide 91
3
bull Create URI representation for the involved object values
bull via formula
bull via reconsiliation
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 92
4
Understand the target vocabulary eg W3C RDF Data Cube Vocabulary
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
Slide 93
4
Define a skeleton to transform your spreadsheet data to RDF
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Map your data to appropriate RDF classes amp properties (model your data)
You can map the data to the ontology using a simple graphical interface to create or edit an existing RDF skeleton
You can set the base URI for the data
Slide 94
Graphical interface to copypaste an existing RDF skeleton
Graphical interface to edit an RDF skeleton
4
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Export your data to RDFXML or Turtle
Slide 95
5
Export of the data in Turtle
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Production pipelines
From desk to automated pipeline
Slide 96
flexibility
volume
OpenRefine
UnifiedViews
Cellar
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Thank you for your attention
and now YOUR questions
Slide 97
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
References
bull 5 Open Data http5stardatainfo
bull ADMS Brochure ISA Programme httpsjoinupeceuropaeuelibrarydocumentadms-brochure
bull An organization ontology W3C httpwwww3orgTRvocab-org W3C
bull Case study on how Linked Data is transforming eGovernment ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcase-study-how-linked-data-transforming-egovernment
bull Common Vocabularies Ontologies Micromodels W3C httpwwww3orgwikiTaskForcesCommunityProjectsLinkingOpenDataCommonVocabularies
bull Cookbook for translating Data Models to RDF Schemas ISA Programme httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
bull D713 - Study on persistent URIs with identification of best practices and recommendations on the topic for the MSs and the EC ISA Programme httpsjoinupeceuropaeusitesdefaultfilesD71320-20Study20on20persistent20URIspdf
bull EUCLID Course 1 Introduction and Application Scenarios httpwwweuclid-projecteumodulescourse1
bull Linked Data Tim Berners-Lee httpwwww3orgDesignIssuesLinkedDatahtml
Slide 98
bull Linked Data Cookbook W3C httpwwww3org2011gldwikiLinked_Data_Cookbook
bull Linking Open Data cloud diagram by Richard Cyganiak and AnjaJentzsch httplod-cloudnet
bull Module 2 Querying Linked Data EUCLID httpwwweuclid-projecteumodulescourse2
bull Open Data ndash An Introduction The Open Knowledge Foundation httpokfnorgopendata
bull Open Refine httpsgithubcomOpenRefine
bull RDF Extension httprefinederiie
bull Resource Description Framework W3C httpwwww3orgRDF
bull Semantic Web Stack W3C httpwwww3orgDesignIssuesdiagramssweb-stack2006apng
bull SPARQL Query Language for RDF W3C httpwwww3orgTRrdf-sparql-query
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Further reading
Slide 99
EC ISA Process and methodology for developing semantic agreements httpsjoinupeceuropaeucommunitycore_vocabulariesdocumentprocess-and-methodology-developing-semantic-agreements
EC ISA Cookbook for translating Data Models to RDF Schemas httpsjoinupeceuropaeucommunitysemicdocumentcookbook-translating-data-models-rdf-schemas
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Further reading
EUCLID - Course 1 Introduction and Application Scenarios
httpwwweuclid-projecteumodulescourse1
EUCLID - Course 2 Querying Linked Data
httpwwweuclid-projecteumodulescourse2
Learning SPARQL Bob DuCharme
httpwwwlearningsparqlcom
Linked Data Cookbook W3C Government Linked Data Working Group
httpwwww3org2011gldwikiLinked_Data_Cookbook
Slide 100
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Further reading
Linked Data Evolving the Web into a Global Data Space Tom Heath and Christian Bizer
httplinkeddatabookcomeditions10
Linked Open Data The Essentials Florian Bauer Martin Kaltenboumlck
httpwwwsemantic-webatLOD-TheEssentialspdf
Linked Open Government Data Li Ding Qualcomm VassiliosPeristeras and Michael Hausenblas
httpieeexploreieeeorgstampstampjsptp=amparnumber=6237454
Semantic Web for the working ontologist Dean Allemang Jim Hendler
httpworkingontologistorg
Slide 101
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
Be part of our team
Slide 102
Find us on
Contact us
Join us on
Follow us
Open Data SupporthttpwwwslidesharenetOpenDataSupport
httpwwwopendatasupporteuOpen Data Supporthttpgoogly9ZZI
OpenDataSupport contactopendatasupporteu
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice
DATASUPPORTOPEN
This presentation has been created by PwC
Authors Michiel De Keyzer Nikolaos Loutas Jana Makedonska Brecht Wyns
Presentation metadata
Slide 103
Open Data Support is funded by the European Commission under SMART 20120107 lsquoLot 2 Provision of services for the Publication Access and Reuse of Open Public Data across the European Union through existing open data portalsrsquo(Contract No 30-CE-053096500-17)
copy 2015 European Commission
Disclaimers
1 The views expressed in this presentation are purely those of the authors and may not in anycircumstances be interpreted as stating an official position of the European CommissionThe European Commission does not guarantee the accuracy of the information included in thispresentation nor does it accept any responsibility for any use thereofReference herein to any specific products specifications process or service by trade nametrademark manufacturer or otherwise does not necessarily constitute or imply its endorsementrecommendation or favouring by the European CommissionAll care has been taken by the author to ensure that she has obtained where necessarypermission to use any parts of manuscripts including illustrations maps and graphs on whichintellectual property rights already exist from the titular holder(s) of such rights or from herhisor their legal representative
2 This presentation has been carefully compiled by PwC but no representation is made orwarranty given (either express or implied) as to the completeness or accuracy of the information itcontains PwC is not liable for the information in this presentation or any decision orconsequence based on the use of it PwC will not be liable for any damages arising from the use ofthe information contained in this presentation The information contained in this presentation isof a general nature and is solely for guidance on matters of general interest This presentation isnot a substitute for professional advice on any particular matter No reader should act on the basisof any matter contained in this publication without considering appropriate professional advice