semantic web-and-public-data - en
DESCRIPTION
Linked (Open) Data in e-Government and Commercial PublishingTRANSCRIPT
TenForce – project: LOD2 1
Linked (Open) Data in e-Government and Commercial Publishing
EU F7 project LOD2partner TenForce (BE)
Johan De Smedt
2014-01-17
TenForce – project: LOD2 2
Introduction
2014-01-17
3
Internet and HTTP - example (1/.) http://www.gfii.fr/fr/
2014-01-17TenForce – project: LO
D2
TenForce – project: LOD2 4
Internet and HTTP - example (1/.) (2/2)
• The internet as it is familiar now:– text, photo, video, .... – hyperliens
• URL format: http://{domaine}/{chemin}
• Hyperlinked delivery over the HTTP protocol– With an immense infrastructure (servers for DNS,
Proxy, cache management, DHCP, ...)– Supporting HTTP parameters and content
negotiation (format/mime-type, language, ...)
2014-01-17
TenForce – project: LOD2 5
Categories of Internet Users (1/3)
• Categories of users– Humans– Applications (software)
• Information handling– Consumers– Publishers– Aggregators
2014-01-17
TenForce – project: LOD2 6
Categories of Internet Users 2/3
• Examples of non human users ...– Index and search robots– Mobile applications– Browsers– Information aggregators and suppliers
• Portals – scientific editors (and others)• Weather forecast• Traffic• News • e-Goverrnement• Hotel and travel booking• ...
2014-01-17
TenForce – project: LOD2 7
Categories of Internet Users 3/3
• ... at the service of humans– economic activities– curiosity– Control (processing procedures, security, ...)– implementation of policies and directives– traffic control and guidance– ...
2014-01-17
TenForce – project: LOD2 8
The objective of web semantics
• Provide the tools (semantic language) to enable communication between Internet users (especially between applications)– Manipulation of raw data to produce value-added
information is a key element of the service industry knowledge
• Establish– "Common understanding"– "Iteroperabillity"– "Collaboration"
2014-01-17
TenForce – project: LOD2 9
Key elements for the building a "common understanding"
• Publish knowledge models for specific domains– Taxonomy, classification, Thesaurus, subject register, Named Authoithy lists, ...– About general publications, the labor market, legislation, geo-location, sports,
politics, ...• Publish vocabularies to express relationships, dependencies, data values
- knowledge base schema (ontology)– Works of art, rights, licenses, trade, ...– Establish a framework to build and publish (update and maintain) the above
publications– Help make the Internet a growing collection of related databases– Use standard or reference ontologies and taxonomies
• Publishing in a semantic format: – content (HTML/human) AND metadata (RDF/application)
• Reliable publishers of quality data are added value
2014-01-17
TenForce – project: LOD2 10
eGovernment
2014-01-17
TenForce – project: LOD2 11
The Demo Application: CELLAR - LOD2
• What is CELLAR– Owner: The Publication Office of the European Union– On-line publications:
• EU legislation - content and metadata• Shortly: EU and national Jurisprudence and case law.
• What is LOD2– LOD: Linked Open Data
– links = hypertext links (HTTP)
• A research project of the 7th EU Framework Programme• Participants: Industry, publishers, Universities, ICT enterprises
• The demo application– Use CELLAR as the original source provider of content in private
published content.• (example, the publisher: Wolters Kluwer – Germany [WKD])
2014-01-17
Demo Use Case (1/3)
• Legislation related products or tools used by:– editorial staff of commercial publishers, – their customers,– Their customer’s customers and – the general public
... are getting direct access to linked EU primary source content and metadata to:
– improve information quality– reduce editorial work– broaden content and metadata product offering
TenForce – project: LOD2 13
Produits - sans LOD 2/5
Cloud products1 source
Unique source of content and metadata in the product
2014-01-17
Products – without LOD 3/5
• Without LOD– access is via Eur-Lex which is not the primary
information source but a publication on its own• delay, availability, not the raw content or metadata
– Scraped information is reviewed and stored locally• task for WKD editorial staff
– WKD products need to be complete and self-contained with limited linking to available online original source
TenForce – project: LOD2 15
Produits - avec LOD 4/5
Cloud products3 Sources
1) original source of rawcontent and metadata
– access by REST API
2) content and metadata sources - human interface
3) enriched content and enriched metadata sources
2014-01-17
Products – with LOD 5/5
• With LOD there is:– Direct access to the primary information source
• content and metadata
– Application assistance for linking with and reusing content and metadata from the original source
– WKD product offering is completed with the available online original source by exposing the origins
TenForce – project: LOD2
The Demo
• Advanced search (SPARQL) in web databases– uses the vocabulary : DCAT – schema of the catalog of datasets
• License information is added to datasets using linked data (LD)
• Retrieve CELLAR stored content and metadata via LD• Integrate with EUROVOC using LD• Reuse CELLAR metadata in WKD content and add
provenance (PROV) refering the oroginal source.
• Goto the public URL– http://212.71.25.157:8080/wp9IntAppEx-1.0/
172014-01-17
TenForce – project: LOD2
Demo (1/.)
• Demo in @en and @de, could be in 20+ languages
• Combined search on CELLAR WP7 LOD DCAT– Full text = “Agrarstruktur Griechenland”– Title = “Kommission”– Issue date = “[ 1986-07-05 , 2000-01-15 [“– Theme = “Besteuerung”
182014-01-17
TenForce – project: LOD2
Demo (1.1/.) • full text = Agrarstruktur Griechenland
– score/rank
192014-01-17
TenForce – project: LOD2
Demo (1.2/.) • full text = Agrarstruktur Griechenland• title = Kommission
202014-01-17
TenForce – project: LOD2
Demo (1.3/.)
• full text = Agrarstruktur Griechenland• title = Kommission• publicaiton date [ 1986-07-05 , 2000-01-15 [
212014-01-17
TenForce – project: LOD2
Demo (1.4/.)
• full text = Agrarstruktur Griechenland• title = Kommission• publicaiton date [ 1986-07-05 , 2000-01-15 [• theme = Besteuerung
222014-01-17
23
Demo (2/.)
• License information– Should be available in the original source– Can be merged into the source by a download
service, addressed via DCAT distribution information
– License reference provides• Work title• Publication Office publisher• License statement• Primary source content
Demo (2.1/.)license reference with primary source title (from DCAT register)
24
2014-01-17TenForce – project: LO
D2
Demo (2.2/.)Publisher found in DCAT as linked data in license reference
25
2014-01-17TenForce – project: LO
D2
TenForce – project: LOD2
Demo (2.3/.)
• License Statement as linked data form license reference
262014-01-17
Demo (2.4/.)Primary source document as linked data from license reference
27
2014-01-17TenForce – project: LO
D2
TenForce – project: LOD2
Demo (3/.)
• Retrieve document from CELLAR– any available format
• Demo uses: html, xhtml, pdf, pdfa1a, pdfa1b
• Retrieve metadata from CELLAR– ELI metadata (RDF/XML format)– CELLAR metadata (RDF/XML format)– "Notice" metadata (Proprietary XML format)
• ELI– “European Legislation Identifier”@en– http://publications.europa.eu/resource/oj/JOC_2012_325_
R_0003_01.FRA.xhtml 282014-01-17
Demo (3.1/.)Primary Source document retrieval options
29
2014-01-17TenForce – project: LO
D2
Demo (3.2/.)Retrieval Primary Source documents
30
2014-01-17
TenForce – project: LOD2
TenForce – project: LOD2
Demo (3.3/.)
• Primary Source metadata retrieval options– ELI (RDF/XML)– raw RDF
(RDF/XML)– proprietary
“notice” XML
312014-01-17
Demo (3.4/.)Retrieve Primary Source metadata
32
Note: Requires proper browser XML and RDF viewing options 2014-01-17
TenForce – project: LOD2
Demo (4/.)• EUROVOC integration
33
2014-01-17TenForce – project: LO
D2
Demo (5/.)Establish reuse - Drag and drop the cellar item over the WK item
34
2014-01-17TenForce – project: LO
D2
Demo (5.1/.)Add primary source reference as linked data
35
2014-01-17TenForce – project: LO
D2
Demo (5.2/5)Access primary source reference as linked data
36
2014-01-17TenForce – project: LO
D2
TenForce – project: LOD2 37
Exemples des cas d’usage connexes
2014-01-17
TenForce – project: LOD2
Scenario 1 – EmploymentUse Case:
SME in the Aachen area has a job vacancy for a Java programmerBackground:
It is getting harder to find good software developers, esp. beyond urban centres. Applicants in areas close to national borders face the challenge that they need very practical information around mobility, which is currently hardly availableEurovoc topics covered:
Labour, Labour Market, Job Mobility, Job VacancySources involved:
European Legislation, Eurostat, destat, ESCO, Open Street Map, Public transport Aachen, European Agency for Safety and Health at Work
Solution: EC contributes core ingredients for a central hub for
transnational job mobility challenges
38
TenForce – project: LOD2
Scenario 2 – Environment
Use Case: German supermarket chain wants to start an image campaign on
seafood that is not in danger towards overfishing in the coming yearsBackground:
In Germany, the market for organic food is growing rapidly as is the support for sustainability. Unfortunately, the information on sustainability is so scattered, that there is no way – e.g. for advertising industry – to
react properly and seriously on this consumer trendEurovoc topics covered:
Nature reserve, environmental politics, management of resources, Fishing industry, fresh fish, catch quotaSources involved:
European legislation, Eurostat, destat, FAO, World Bank, European Environment AgencySolution:
EC contributes core ingredients for a central hub for environmental protection
39
TenForce – project: LOD2
Scenario 3 – Energy
Use Case: House owner in the Netherlands wants to build solar cells on his roof
Background: Due to the „Energiewende“ in Germany, a lot of knowledge on
renewal energy, its impact, technologies and vendors has been created on a national level. This information is also relevant for other EU member states and their citizens
Eurovoc topics covered: Energy industry, solar energy, photovoltaic cell
Sources involved: European legislation, Eurostat, destat, Joint Research Center, Agency for the Cooperation of Energy Regulators, International
Energy Agency, Stiftung WarentestSolution:
EC contributes core ingredients for transnational energy challenges
40
TenForce – project: LOD2 41
Next for CELLAR (2014)
• Transform all published CELLAR legislation according ELI directive
• Publish case law according ECLI directive• Publish the catalog of available legislation and
case law (occasionally using the W3C DCAT recommendation)
• Publish all EU used taxonomies using the LOD best practices.
2014-01-17
TenForce – project: LOD2 42
ESCO
2014-01-17
TenForce – project: LOD2 43
The ESCO Project
• ESCO– Project owner: DG-EMPL– ESCO
• https://ec.europa.eu/esco/home (version 0)• European Skills, Competences, Qualifications and
Occupations• The knowledge base details concepts in three pillars
(taxonomies) and provides semantically rich relations between the concepts.
• Re-uses several other taxonomies (Eurostat, Unesco, DG-EAC, PO of the EU)
2014-01-17
O [Occupation]
Organized by economic activity sectors
- Agriculture- Education- ...
ESCO Data ModelOccupation Pillar
• mapped to– ISCO xx (standard of ILO/UNO)– ROME (French labor market standard)– ...
2014-01-17 TenForce – project: LOD2 44
ISCO08
broaderMatch
ISCO88
correspondanceexactMatch
broaderMatch
ROME
broaderMatchexactMatch
NACEsubject
ESCO Data ModelOccupation Pillar
• relation Description
2014-01-17TenForce – project: LO
D2
45
Occupation Description: =======================================================================================================
Skills: =======================================================================================================
Qualifications: ==========================================================================================
text document - unstructured or semi structured
Occupation
aboutOccupation
ESCO Data ModelOccupation Pillar
• Skills are– transversal (across activity sectors)– specific to an activity sector
• Types of skills– knowledge, skill, competence, ability
• Group of skills– Leaf Group of skills
• Skill (member of a skill group)
2014-01-17
TenForce – project: LOD246
• relation occupation - skill
Occupation Description: =======================================================================================================
Skills: =======================================================================================================
Qualifications: ==========================================================================================
text document - unstructured or semi structured
Occupation
aboutOccupation
skill
skill
essential
desired
ESCO Data Model
• Skill and Skill facet2014-01-17 TenForce – project: LOD2 47
Foreign Language expertise
LanguageFacet
Language usageFacet
main facet sub facet
under-standing
Speaking
Writing
english
german
dutch
oasisLoC
EU-POskos:exactMatch
member
topMember
Listening
Reading
Spoken interaction
Spoken production
narrower
narrower
1. Define the different aspects/dimensions of a concept: - main facet (0..1) - sub facets (0..n)
2. Define/specify the standard to use or give a good description of the concepts contained by each facet
3. For each list of values from step 2. a collection of concepts (Facet Group) is created.
4. Manage the members of the facet group
(3) (3)
(2)
(4) (4)
member
sub facet
(1)
(1)
ESCO Data ModelQualification Pillar
• EQF, FoET, Awarding Body
2014-01-17 TenForce – project: LOD2 48
Q-groups
ESCOQ-Pillar
Q-members
FoET
exactMatch
EQF
tagging
hasAwardingBodyDescription
AwardingBody
tagging
description
ESCO Data ModelOccupation Pillar (Reprise)
• relation descriptif
2014-01-17TenForce – project: LO
D2
49
Occupation Description: =======================================================================================================
Skills: =======================================================================================================
Qualifications: ==========================================================================================
text document - unstructured or semi structured
Occupation
aboutOccupation
ESCO Data ModelOccupation Pillar (Reprise)
• Relationship: Occupation - Qualification
2014-01-17TenForce – project: LO
D2
50
Occupation Description: =======================================================================================================
Skills: =======================================================================================================
Qualifications: ==========================================================================================
text document - unstructured or semi structured
Occupation
aboutOccupation
qualification
ESCO Data ModelQualification Pillar
• Qualification are maintained (direct) or included (indirect)• direct Qualification are maintained by the DG-EMPL/ESCO. Inclusion is
an “as needed” basis– International qualification schemes (outside of the EU)
• USA, Chine, ...
– Qualifications awarded by enterprises• ORACLE, CISCO, Microsoft, ...
• Qualification subject to indirect inclusion– Are maintained by national (EU member) organizations– Registered and structured by DG EAC
(Education and Culture)– Transferred to DG EMPL using the XML schema of DG-EAC – Uploaded in ESCO by DG-EMPL/ESCO
2014-01-17 TenForce – project: LOD2 51
ESCO Data ModelQualification Pillar
• Relationship description
2014-01-17 TenForce – project: LOD2 52
Qualification Description: =======================================================================================================
Skills: =======================================================================================================
XML document + occasional description
aboutQualificationskill
skill
qualification
awarding body
competence
hasAwardingBody
ESCO Data Model - summary• ESCO consists of three pillars (A pillar is a class of concepts)
– occupation– competence– qualification
• ESCO concepts are mapped to other concepts of like taxonomies. The mapping is expressed using SKOS mapping properties.
– The correspondence between ESCO and ISCO (ESCO occupation has as broader match an ISCO occupation group)
– Planned: mapping ESCO to ROME (French occupation taxonomy) ... other mappings may be established as needed (O * NET)
• The ESCO semantics are expressed using standard support taxonomies– To tag ESCO pillar concepts (using DCMI property dcterms:subject) – To structure recurring specializations in the ESCO model (using facets, collections or groups of concepts)– Examples
• Location (Eurostat: NUTS; ISO 3166)• economic activity sectors (Eurostat: NACE)• European qualification Framework (EQF)• CEFR (Common European Framework of Reference for Languages)• UNESCO (ISU): FoET, ISCED• Languages (Publication Office of the EU, Library of Congress, OASIS-psi, ISO 639)• ...
2014-01-17 TenForce – project: LOD2 53
TenForce – project: LOD2 54
Tools for Linked Open Data
2014-01-17
TenForce – project: LOD2 55
A small list of tools for LOD• SPARQL end-point –NoSQL data base (RDF graph, Colonne)
– Virtuoso, Oracle, Allegrograph• Frameworks integrating sematic libraries
– Jena, Sesame• Analyser
– Topbraid, Protégé• Alignment of knowledge bases
– SILK: • http://lod2.eu/Project/Silk.html• http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/
• LOD best practices– https://dvcs.w3.org/hg/gld/raw-file/default/bp/index.html
2014-01-17
TenForce – project: LOD2 56
TenForce References• Semantic web Projects
– Eurovoc– Cellar– ESCO– LOD2 (R&D)– Wolters Kluwer– ODP (Open Data Portal)– ODS (Open Data Support)
• ISO 25964 (Thesaurus standardization)
• TenForce.com• [email protected]
2014-01-17