semantic web-and-public-data - en

56
Linked (Open) Data in e- Government and Commercial Publishing EU F7 project LOD2 partner TenForce (BE) Johan De Smedt 2014-01-17 TenForce – project: LOD2 1

Upload: tenforce

Post on 11-May-2015

1.185 views

Category:

Technology


7 download

DESCRIPTION

Linked (Open) Data in e-Government and Commercial Publishing 

TRANSCRIPT

Page 1: Semantic web-and-public-data - en

TenForce – project: LOD2 1

Linked (Open) Data in e-Government and Commercial Publishing

EU F7 project LOD2partner TenForce (BE)

Johan De Smedt

2014-01-17

Page 2: Semantic web-and-public-data - en

TenForce – project: LOD2 2

Introduction

2014-01-17

Page 3: Semantic web-and-public-data - en

3

Internet and HTTP - example (1/.) http://www.gfii.fr/fr/

2014-01-17TenForce – project: LO

D2

Page 4: Semantic web-and-public-data - en

TenForce – project: LOD2 4

Internet and HTTP - example (1/.) (2/2)

• The internet as it is familiar now:– text, photo, video, .... – hyperliens

• URL format: http://{domaine}/{chemin}

• Hyperlinked delivery over the HTTP protocol– With an immense infrastructure (servers for DNS,

Proxy, cache management, DHCP, ...)– Supporting HTTP parameters and content

negotiation (format/mime-type, language, ...)

2014-01-17

Page 5: Semantic web-and-public-data - en

TenForce – project: LOD2 5

Categories of Internet Users (1/3)

• Categories of users– Humans– Applications (software)

• Information handling– Consumers– Publishers– Aggregators

2014-01-17

Page 6: Semantic web-and-public-data - en

TenForce – project: LOD2 6

Categories of Internet Users 2/3

• Examples of non human users ...– Index and search robots– Mobile applications– Browsers– Information aggregators and suppliers

• Portals – scientific editors (and others)• Weather forecast• Traffic• News • e-Goverrnement• Hotel and travel booking• ...

2014-01-17

Page 7: Semantic web-and-public-data - en

TenForce – project: LOD2 7

Categories of Internet Users 3/3

• ... at the service of humans– economic activities– curiosity– Control (processing procedures, security, ...)– implementation of policies and directives– traffic control and guidance– ...

2014-01-17

Page 8: Semantic web-and-public-data - en

TenForce – project: LOD2 8

The objective of web semantics

• Provide the tools (semantic language) to enable communication between Internet users (especially between applications)– Manipulation of raw data to produce value-added

information is a key element of the service industry knowledge

• Establish– "Common understanding"– "Iteroperabillity"– "Collaboration"

2014-01-17

Page 9: Semantic web-and-public-data - en

TenForce – project: LOD2 9

Key elements for the building a "common understanding"

• Publish knowledge models for specific domains– Taxonomy, classification, Thesaurus, subject register, Named Authoithy lists, ...– About general publications, the labor market, legislation, geo-location, sports,

politics, ...• Publish vocabularies to express relationships, dependencies, data values

- knowledge base schema (ontology)– Works of art, rights, licenses, trade, ...– Establish a framework to build and publish (update and maintain) the above

publications– Help make the Internet a growing collection of related databases– Use standard or reference ontologies and taxonomies

• Publishing in a semantic format: – content (HTML/human) AND metadata (RDF/application)

• Reliable publishers of quality data are added value

2014-01-17

Page 10: Semantic web-and-public-data - en

TenForce – project: LOD2 10

eGovernment

2014-01-17

Page 11: Semantic web-and-public-data - en

TenForce – project: LOD2 11

The Demo Application: CELLAR - LOD2

• What is CELLAR– Owner: The Publication Office of the European Union– On-line publications:

• EU legislation - content and metadata• Shortly: EU and national Jurisprudence and case law.

• What is LOD2– LOD: Linked Open Data

– links = hypertext links (HTTP)

• A research project of the 7th EU Framework Programme• Participants: Industry, publishers, Universities, ICT enterprises

• The demo application– Use CELLAR as the original source provider of content in private

published content.• (example, the publisher: Wolters Kluwer – Germany [WKD])

2014-01-17

Page 12: Semantic web-and-public-data - en

Demo Use Case (1/3)

• Legislation related products or tools used by:– editorial staff of commercial publishers, – their customers,– Their customer’s customers and – the general public

... are getting direct access to linked EU primary source content and metadata to:

– improve information quality– reduce editorial work– broaden content and metadata product offering

Page 13: Semantic web-and-public-data - en

TenForce – project: LOD2 13

Produits - sans LOD 2/5

Cloud products1 source

Unique source of content and metadata in the product

2014-01-17

Page 14: Semantic web-and-public-data - en

Products – without LOD 3/5

• Without LOD– access is via Eur-Lex which is not the primary

information source but a publication on its own• delay, availability, not the raw content or metadata

– Scraped information is reviewed and stored locally• task for WKD editorial staff

– WKD products need to be complete and self-contained with limited linking to available online original source

Page 15: Semantic web-and-public-data - en

TenForce – project: LOD2 15

Produits - avec LOD 4/5

Cloud products3 Sources

1) original source of rawcontent and metadata

– access by REST API

2) content and metadata sources - human interface

3) enriched content and enriched metadata sources

2014-01-17

Page 16: Semantic web-and-public-data - en

Products – with LOD 5/5

• With LOD there is:– Direct access to the primary information source

• content and metadata

– Application assistance for linking with and reusing content and metadata from the original source

– WKD product offering is completed with the available online original source by exposing the origins

Page 17: Semantic web-and-public-data - en

TenForce – project: LOD2

The Demo

• Advanced search (SPARQL) in web databases– uses the vocabulary : DCAT – schema of the catalog of datasets

• License information is added to datasets using linked data (LD)

• Retrieve CELLAR stored content and metadata via LD• Integrate with EUROVOC using LD• Reuse CELLAR metadata in WKD content and add

provenance (PROV) refering the oroginal source.

• Goto the public URL– http://212.71.25.157:8080/wp9IntAppEx-1.0/

172014-01-17

Page 18: Semantic web-and-public-data - en

TenForce – project: LOD2

Demo (1/.)

• Demo in @en and @de, could be in 20+ languages

• Combined search on CELLAR WP7 LOD DCAT– Full text = “Agrarstruktur Griechenland”– Title = “Kommission”– Issue date = “[ 1986-07-05 , 2000-01-15 [“– Theme = “Besteuerung”

182014-01-17

Page 19: Semantic web-and-public-data - en

TenForce – project: LOD2

Demo (1.1/.) • full text = Agrarstruktur Griechenland

– score/rank

192014-01-17

Page 20: Semantic web-and-public-data - en

TenForce – project: LOD2

Demo (1.2/.) • full text = Agrarstruktur Griechenland• title = Kommission

202014-01-17

Page 21: Semantic web-and-public-data - en

TenForce – project: LOD2

Demo (1.3/.)

• full text = Agrarstruktur Griechenland• title = Kommission• publicaiton date [ 1986-07-05 , 2000-01-15 [

212014-01-17

Page 22: Semantic web-and-public-data - en

TenForce – project: LOD2

Demo (1.4/.)

• full text = Agrarstruktur Griechenland• title = Kommission• publicaiton date [ 1986-07-05 , 2000-01-15 [• theme = Besteuerung

222014-01-17

Page 23: Semantic web-and-public-data - en

23

Demo (2/.)

• License information– Should be available in the original source– Can be merged into the source by a download

service, addressed via DCAT distribution information

– License reference provides• Work title• Publication Office publisher• License statement• Primary source content

Page 24: Semantic web-and-public-data - en

Demo (2.1/.)license reference with primary source title (from DCAT register)

24

2014-01-17TenForce – project: LO

D2

Page 25: Semantic web-and-public-data - en

Demo (2.2/.)Publisher found in DCAT as linked data in license reference

25

2014-01-17TenForce – project: LO

D2

Page 26: Semantic web-and-public-data - en

TenForce – project: LOD2

Demo (2.3/.)

• License Statement as linked data form license reference

262014-01-17

Page 27: Semantic web-and-public-data - en

Demo (2.4/.)Primary source document as linked data from license reference

27

2014-01-17TenForce – project: LO

D2

Page 28: Semantic web-and-public-data - en

TenForce – project: LOD2

Demo (3/.)

• Retrieve document from CELLAR– any available format

• Demo uses: html, xhtml, pdf, pdfa1a, pdfa1b

• Retrieve metadata from CELLAR– ELI metadata (RDF/XML format)– CELLAR metadata (RDF/XML format)– "Notice" metadata (Proprietary XML format)

• ELI– “European Legislation Identifier”@en– http://publications.europa.eu/resource/oj/JOC_2012_325_

R_0003_01.FRA.xhtml 282014-01-17

Page 29: Semantic web-and-public-data - en

Demo (3.1/.)Primary Source document retrieval options

29

2014-01-17TenForce – project: LO

D2

Page 30: Semantic web-and-public-data - en

Demo (3.2/.)Retrieval Primary Source documents

30

2014-01-17

TenForce – project: LOD2

Page 31: Semantic web-and-public-data - en

TenForce – project: LOD2

Demo (3.3/.)

• Primary Source metadata retrieval options– ELI (RDF/XML)– raw RDF

(RDF/XML)– proprietary

“notice” XML

312014-01-17

Page 32: Semantic web-and-public-data - en

Demo (3.4/.)Retrieve Primary Source metadata

32

Note: Requires proper browser XML and RDF viewing options 2014-01-17

TenForce – project: LOD2

Page 33: Semantic web-and-public-data - en

Demo (4/.)• EUROVOC integration

33

2014-01-17TenForce – project: LO

D2

Page 34: Semantic web-and-public-data - en

Demo (5/.)Establish reuse - Drag and drop the cellar item over the WK item

34

2014-01-17TenForce – project: LO

D2

Page 35: Semantic web-and-public-data - en

Demo (5.1/.)Add primary source reference as linked data

35

2014-01-17TenForce – project: LO

D2

Page 36: Semantic web-and-public-data - en

Demo (5.2/5)Access primary source reference as linked data

36

2014-01-17TenForce – project: LO

D2

Page 37: Semantic web-and-public-data - en

TenForce – project: LOD2 37

Exemples des cas d’usage connexes

2014-01-17

Page 38: Semantic web-and-public-data - en

TenForce – project: LOD2

Scenario 1 – EmploymentUse Case:

SME in the Aachen area has a job vacancy for a Java programmerBackground:

It is getting harder to find good software developers, esp. beyond urban centres. Applicants in areas close to national borders face the challenge that they need very practical information around mobility, which is currently hardly availableEurovoc topics covered:

Labour, Labour Market, Job Mobility, Job VacancySources involved:

European Legislation, Eurostat, destat, ESCO, Open Street Map, Public transport Aachen, European Agency for Safety and Health at Work

Solution: EC contributes core ingredients for a central hub for

transnational job mobility challenges

38

Page 39: Semantic web-and-public-data - en

TenForce – project: LOD2

Scenario 2 – Environment

Use Case: German supermarket chain wants to start an image campaign on

seafood that is not in danger towards overfishing in the coming yearsBackground:

In Germany, the market for organic food is growing rapidly as is the support for sustainability. Unfortunately, the information on sustainability is so scattered, that there is no way – e.g. for advertising industry – to

react properly and seriously on this consumer trendEurovoc topics covered:

Nature reserve, environmental politics, management of resources, Fishing industry, fresh fish, catch quotaSources involved:

European legislation, Eurostat, destat, FAO, World Bank, European Environment AgencySolution:

EC contributes core ingredients for a central hub for environmental protection

39

Page 40: Semantic web-and-public-data - en

TenForce – project: LOD2

Scenario 3 – Energy

Use Case: House owner in the Netherlands wants to build solar cells on his roof

Background: Due to the „Energiewende“ in Germany, a lot of knowledge on

renewal energy, its impact, technologies and vendors has been created on a national level. This information is also relevant for other EU member states and their citizens

Eurovoc topics covered: Energy industry, solar energy, photovoltaic cell

Sources involved: European legislation, Eurostat, destat, Joint Research Center, Agency for the Cooperation of Energy Regulators, International

Energy Agency, Stiftung WarentestSolution:

EC contributes core ingredients for transnational energy challenges

40

Page 41: Semantic web-and-public-data - en

TenForce – project: LOD2 41

Next for CELLAR (2014)

• Transform all published CELLAR legislation according ELI directive

• Publish case law according ECLI directive• Publish the catalog of available legislation and

case law (occasionally using the W3C DCAT recommendation)

• Publish all EU used taxonomies using the LOD best practices.

2014-01-17

Page 42: Semantic web-and-public-data - en

TenForce – project: LOD2 42

ESCO

2014-01-17

Page 43: Semantic web-and-public-data - en

TenForce – project: LOD2 43

The ESCO Project

• ESCO– Project owner: DG-EMPL– ESCO

• https://ec.europa.eu/esco/home (version 0)• European Skills, Competences, Qualifications and

Occupations• The knowledge base details concepts in three pillars

(taxonomies) and provides semantically rich relations between the concepts.

• Re-uses several other taxonomies (Eurostat, Unesco, DG-EAC, PO of the EU)

2014-01-17

Page 44: Semantic web-and-public-data - en

O [Occupation]

Organized by economic activity sectors

- Agriculture- Education- ...

ESCO Data ModelOccupation Pillar

• mapped to– ISCO xx (standard of ILO/UNO)– ROME (French labor market standard)– ...

2014-01-17 TenForce – project: LOD2 44

ISCO08

broaderMatch

ISCO88

correspondanceexactMatch

broaderMatch

ROME

broaderMatchexactMatch

NACEsubject

Page 45: Semantic web-and-public-data - en

ESCO Data ModelOccupation Pillar

• relation Description

2014-01-17TenForce – project: LO

D2

45

Occupation Description: =======================================================================================================

Skills: =======================================================================================================

Qualifications: ==========================================================================================

text document - unstructured or semi structured

Occupation

aboutOccupation

Page 46: Semantic web-and-public-data - en

ESCO Data ModelOccupation Pillar

• Skills are– transversal (across activity sectors)– specific to an activity sector

• Types of skills– knowledge, skill, competence, ability

• Group of skills– Leaf Group of skills

• Skill (member of a skill group)

2014-01-17

TenForce – project: LOD246

• relation occupation - skill

Occupation Description: =======================================================================================================

Skills: =======================================================================================================

Qualifications: ==========================================================================================

text document - unstructured or semi structured

Occupation

aboutOccupation

skill

skill

essential

desired

Page 47: Semantic web-and-public-data - en

ESCO Data Model

• Skill and Skill facet2014-01-17 TenForce – project: LOD2 47

Foreign Language expertise

LanguageFacet

Language usageFacet

main facet sub facet

under-standing

Speaking

Writing

english

german

dutch

oasisLoC

EU-POskos:exactMatch

member

topMember

Listening

Reading

Spoken interaction

Spoken production

narrower

narrower

1. Define the different aspects/dimensions of a concept: - main facet (0..1) - sub facets (0..n)

2. Define/specify the standard to use or give a good description of the concepts contained by each facet

3. For each list of values from step 2. a collection of concepts (Facet Group) is created.

4. Manage the members of the facet group

(3) (3)

(2)

(4) (4)

member

sub facet

(1)

(1)

Page 48: Semantic web-and-public-data - en

ESCO Data ModelQualification Pillar

• EQF, FoET, Awarding Body

2014-01-17 TenForce – project: LOD2 48

Q-groups

ESCOQ-Pillar

Q-members

FoET

exactMatch

EQF

tagging

hasAwardingBodyDescription

AwardingBody

tagging

description

Page 49: Semantic web-and-public-data - en

ESCO Data ModelOccupation Pillar (Reprise)

• relation descriptif

2014-01-17TenForce – project: LO

D2

49

Occupation Description: =======================================================================================================

Skills: =======================================================================================================

Qualifications: ==========================================================================================

text document - unstructured or semi structured

Occupation

aboutOccupation

Page 50: Semantic web-and-public-data - en

ESCO Data ModelOccupation Pillar (Reprise)

• Relationship: Occupation - Qualification

2014-01-17TenForce – project: LO

D2

50

Occupation Description: =======================================================================================================

Skills: =======================================================================================================

Qualifications: ==========================================================================================

text document - unstructured or semi structured

Occupation

aboutOccupation

qualification

Page 51: Semantic web-and-public-data - en

ESCO Data ModelQualification Pillar

• Qualification are maintained (direct) or included (indirect)• direct Qualification are maintained by the DG-EMPL/ESCO. Inclusion is

an “as needed” basis– International qualification schemes (outside of the EU)

• USA, Chine, ...

– Qualifications awarded by enterprises• ORACLE, CISCO, Microsoft, ...

• Qualification subject to indirect inclusion– Are maintained by national (EU member) organizations– Registered and structured by DG EAC

(Education and Culture)– Transferred to DG EMPL using the XML schema of DG-EAC – Uploaded in ESCO by DG-EMPL/ESCO

2014-01-17 TenForce – project: LOD2 51

Page 52: Semantic web-and-public-data - en

ESCO Data ModelQualification Pillar

• Relationship description

2014-01-17 TenForce – project: LOD2 52

Qualification Description: =======================================================================================================

Skills: =======================================================================================================

XML document + occasional description

aboutQualificationskill

skill

qualification

awarding body

competence

hasAwardingBody

Page 53: Semantic web-and-public-data - en

ESCO Data Model - summary• ESCO consists of three pillars (A pillar is a class of concepts)

– occupation– competence– qualification

• ESCO concepts are mapped to other concepts of like taxonomies. The mapping is expressed using SKOS mapping properties.

– The correspondence between ESCO and ISCO (ESCO occupation has as broader match an ISCO occupation group)

– Planned: mapping ESCO to ROME (French occupation taxonomy) ... other mappings may be established as needed (O * NET)

• The ESCO semantics are expressed using standard support taxonomies– To tag ESCO pillar concepts (using DCMI property dcterms:subject) – To structure recurring specializations in the ESCO model (using facets, collections or groups of concepts)– Examples

• Location (Eurostat: NUTS; ISO 3166)• economic activity sectors (Eurostat: NACE)• European qualification Framework (EQF)• CEFR (Common European Framework of Reference for Languages)• UNESCO (ISU): FoET, ISCED• Languages (Publication Office of the EU, Library of Congress, OASIS-psi, ISO 639)• ...

2014-01-17 TenForce – project: LOD2 53

Page 54: Semantic web-and-public-data - en

TenForce – project: LOD2 54

Tools for Linked Open Data

2014-01-17

Page 55: Semantic web-and-public-data - en

TenForce – project: LOD2 55

A small list of tools for LOD• SPARQL end-point –NoSQL data base (RDF graph, Colonne)

– Virtuoso, Oracle, Allegrograph• Frameworks integrating sematic libraries

– Jena, Sesame• Analyser

– Topbraid, Protégé• Alignment of knowledge bases

– SILK: • http://lod2.eu/Project/Silk.html• http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/

• LOD best practices– https://dvcs.w3.org/hg/gld/raw-file/default/bp/index.html

2014-01-17

Page 56: Semantic web-and-public-data - en

TenForce – project: LOD2 56

TenForce References• Semantic web Projects

– Eurovoc– Cellar– ESCO– LOD2 (R&D)– Wolters Kluwer– ODP (Open Data Portal)– ODS (Open Data Support)

• ISO 25964 (Thesaurus standardization)

• TenForce.com• [email protected]

2014-01-17