linked statistical data: challenges & tools · r e s ul t s di scov er & pr e-pr ocess r aw...

39
ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Linked Statistical Data: Challenges & Tools Dr. Evangelos Kalampokis CERTH-ITI and University of Macedonia, Greece

Upload: others

Post on 19-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

ESS Workshop on dissemination of official statistics as open data

18-19 January 2017, Malta

Linked Statistical Data: Challenges & ToolsDr. Evangelos Kalampokis

CERTH-ITI and University of Macedonia, Greece

Page 2: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 2

The Vision of Linked Data Cube Analytics

Software Tools

Challenges

Conclusion

Table of Contents

Page 3: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 3

The Vision of Linked Data Cube Analytics

Software Tools

Challenges

Conclusion

Table of Contents

Page 4: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 4

Open Statistical Data are fragmented

Searching data.gov.uk for “unemployment” datasets:

122 results (links and files)

These results provide access to 56 files and 610 links

These links lead to 18 other portals

Through them to more than 2000 other files

Open Statistical Data – Fragmented Views

E. Kalampokis, E. Tambouris, A. Karamanou, K. Tarabanis (2016) Open Statistics: The Rise of a new Era for Open Data?, EGOV2016, LNCS 9820, pp.31-43, Springer.

Page 5: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 5

All these web portals provide complementary views of the unemployment data.

For example, focusing on geo dimension: Data about unemployment in

different administrative levels in the UK.

ONS, NOMIS, NeSS and Open Data Communities provide data about the whole country.

Local government portals provide data for specific areas (e.g. Warwickshire, Cambridgeshire)

Open Statistical Data – Complementary ViewsLevel 0 UK ONS

Level 1 Countries ONS

Level 2 Regions ONS, NOMIS, NeSS,

Level 3 Counties NOMIS

Level 4 Districts/Boroughs/Divisions ODC

Level 5 Local Enterprise Parttnership ONS, NOMIS

Level 6 Local Authorities/Communities

First Areas

ONS, NOMIS, NeSS

Level 7 Parliamentary Constituencies ONS, NOMIS

Level 8 Wards Warkwickshire,

Cambridgeshire

Level 9 Market Towns Cambridgeshire

Level 10 Super Output Area Warkwickshire

Level 11 Super Output Area Middle Layer NeSS

Level 12 Super Output Area Lower Layer NeSS

Level 13 Output Area NeSS

Level 14 Parishes Cambridgeshire

Page 6: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 6

Linked Statistical Data (aka linked data cubes) have the potential to solve data interoperability, facilitate data integration and thus provide unified access to multiple datasets across the Web.

Linked Statistical Data

http://lod-cloud.net

Page 7: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 7

Performing analytics on top of multiple datasets.

Two categories of scenarios: Easy performance of typical data

analytics scenarios

Realization of innovative data analytics scenarios

A type of systems that enables users to: Discover data that make sense to

integrate.

Combine data from multiple sources.

Perform various types of analysis on top of integrated data.

The Vision of Linked Data Cube Analytics

E. Kalampokis, E. Tambouris, K. Tarabanis (2016) Linked Open Cube Analytics Systems: Potential and Challenges IEEE Intelligent Systems, Vol. 31, No.5, pp.89-92

Page 8: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 8

The Vision of Linked Data Cube Analytics

Software Tools

Challenges

Conclusion

Table of Contents

Page 9: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 9

Software tools for publishing, integrating and exploiting Linked Open Statistical Data.

Software tools

Metadata

Expand Cube

Discover & Explore

Cube

Analyse Cube

Communicate

ResultsDiscover & Pre-process

Raw Data

Define Structure &

Create Cube

Publish Cube

Identify Compatible

Cubes

Processed raw data

Create

Expand

Exploit

E. Tambouris, E. Kalampokis, K. Tarabanis (2015) Processing Linked Open Data Cubes, EGOV2015, LNCS 9248, pp.130-143, Springer.

A. Karamanou, E. Kalampokis, E. Tambouris, K. Tarabanis (2016) Linked data cubes: Research results so far, SemStats2016, 17-21 October 2016, Kobe, Japan, CEUR-WS

Page 10: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 10

Publishing Tools Overview

Technical formats of raw data (CSV, RDBMS, JSON, OLAP etc.)

User interface (GUI, CLI)

Structure of the outcome cube (e.g. pre-defined)

Evaluation (performance, ease of use)

Programming languages and environments

Α. Karamanou, E. Kalampokis, E. Tambouris, K. Tarabanis (0000) “Understanding the Use of Linked Open Data in Statistics” Journal of Web Semantics [under review]

Page 11: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 11

Multiple technical formats - The OpenCube Toolkit

http://opencube-toolkit.eu

Page 12: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 12

LOD2 Statistical Workbench - CSV2DataCube

R. P. E. Salas, M. Martin, F. M. Da Mota, S. Auer, K. Breitman, M. A.Casanova, Publishing statistical data on the web, in: Semantic Computing(ICSC), 2012 IEEE Sixth International Conference on, IEEE, 2012,pp. 285–292.

Page 13: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 13

QBer

R. Hoekstra, A. Merono-Penuela, K. Dentler, A. Rijpma, R. Zijdeman, I. Zandhuis, An ecosystem for linked humanities data, in: Proceedings1060 of the 1st Workshop on Humanities in the Semantic Web (WHiSe 2016), ESWC, 2016.

Page 14: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 14

Open source software framework for transforming tabular data (CSV or XLS) to RDF

Programming skills required (DSL to specify transformation pipelines)

Performs well with large datasets

Faster transformation

Command Line Interface - Grafter

Page 15: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 15

Types of data analysis (e.g. graphs, maps, OLAP, statistical analyses etc.)

Programming languages and development platforms

Tools, platforms, web applications etc.

Domain specific tools (tourism, health etc.)

Evaluation (performance, user friendliness)

Structure of the cube

Exploiting Software Tools

Α. Karamanou, E. Kalampokis, E. Tambouris, K. Tarabanis (0000) “Understanding the Use of Linked Open Data in Statistics” Journal of Web Semantics [under review]

Page 16: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 16

Exploiting Software Tools – Type of analysis

Visualisation55%Statistical

Analysis24%

Browsing13%

OLAP8%

A. Karamanou, E. Kalampokis, E. Tambouris, K. Tarabanis (2016) Linked data cubes: Research results so far, SemStats2016, 17-21 October 2016, Kobe, Japan, CEUR-WS

Page 17: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 17

OpenCube MapView

http://opencube-toolkit.eu

Page 18: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 18

A web application that enables querying, refining, and visualisinglinked data.

Several of the available datasets are Linked Statistical Data

CODE Linked Data Query Wizard

http://code.know-center.tugraz.at

Page 19: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 19

CODE Linked Data Query Wizard (Browser)

http://code.know-center.tugraz.at

Page 20: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 20

StatSpace Explorer

http://statspace.linkedwidgets.org/explorer

Page 21: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 21

StatSpace Explorer - Visualisation

http://statspace.linkedwidgets.org/explorer

Page 22: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 22

Discover Linked Data Cubes that makes sense to join.

Discover Linked Data Cubes that are structural compatible to join.

Integrate structural compatible cubes.

Integrating Software Tools

E. Kalampokis, E. Tambouris, A. Karamanou, K. Tarabanis (2016) Open Statistics: The Rise of a new Era for Open Data?, EGOV2016, LNCS 9820, pp.31-43, Springer.

Page 23: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 23

An OLAP Browser that can perform OLAP operations on top of multiple datasets that reside in different portals

OpenCube OLAP Browser

http://opencube-toolkit.eu

Page 24: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 24

OpenCube OLAP Browser

The OLAP Browser enables performing OLAP operations on data from:

Flemish Government

Scottish Government

Page 25: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 25

OpenCube OLAP Browser

OLAP operations on top of integrated view of multiple statistical datasets.

E. Kalampokis, E. Tambouris, D. Zeginis, K. Tarabanis “Expanding Data Cubes for Enhanced OLAP Analytics on the Web of Linked Data” IEEE Transactions on Knowledge and Data Engineering [Under Review]

Page 26: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 26

Compatibility Explorer & Expander

OpenCube OLAP Browser

Given a cube in the local

store, the Compatibility

Explorer

(a) Searches into the

Linked Data Web and

identifies cubes that

are compatible to

expand the initial cube

and

(b) Establishes typed links

between the local and

the compatible cubes

The Expander

creates a new

expanded cube by

merging two

compatible ones.

The Expander

implements the

theoretical

framework

Page 27: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 27

StatSpace Explorer – Identify Relatable datasets

http://statspace.linkedwidgets.org/explorer

Page 28: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 28

StatSpace Explorer– Compare Relatable datasets

Page 29: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 29

The Vision of Linked Data Cube Analytics

Software Tools

Challenges

Conclusion

Table of Contents

Page 30: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 30

Linked Data use a set of standards (RDF, HTTP, vocabularies etc.) to enable the unified access of data on the Web.

However, publishing Statistical Data using Linked Data principles may result to Linked Statistical Data silos.

Software tools cannot be reused across datasets

Creation of system silos

Challenges - Data Silos

http://www.flickr.com/photos/rachelrusinski/526260022

Page 31: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 31

A set of publishing practices for Linked Data Cubes.

In the first steps more than 10 academics and practitioners have been involved.

The results will be publicly available for further discussion.

Application Profile for Linked Data Cubes

http://OpenGovIntelligence.eu

@OpenGovInt

Page 32: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 32

Application Profile for Linked Data Cubes (example)

http://OpenGovIntelligence.eu

@OpenGovInt

Page 33: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 33

Developers, data scientists are not familiar with RDF, SPARQL etc.

We can hide the complexity from the end users

We can use Linked Data at the back end to integrate and semantically enrich statistical data.

Challenges - Technical complexity of Linked Data technologies

Page 34: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 34

This API is designed to support developers to use Linked Statistical Data while assuming minimal knowledge of Linked Data.

Specification and implementation of a REST API that translates RDF data cubes to JSON data

Numerous existing software tools and libraries can be used to exploit Linked Statistical Data

OpenGovIntelligence JSON-QB API

http://OpenGovIntelligence.eu

@OpenGovInt

Page 35: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 35

GET /dimensionsParameters: dataset (required)

Sample result:

GET /dimension-valuesParameters: dataset (required), dimension (required)Sample result:

JSON-QB API - metadata

[{"@id":"http://purl.org/linked-data/sdmx/2009/dimension#sex","label":"sex"},{"@id":"http://example.com#timePeriod","label":"Time Period"},{"@id":"http://example.com#refArea","label":"Reference Area”}]

{"dimension":{"URI":"http://example.com#timePeriod,"label":"Time Period”}"values":[

{"@id":"http://example.com/concept/year2004#id", "label":"2004"},{"@id":"http://example.com/concept/year2005#id", "label":"2005"},{"@id":"http://example.com/concept/year2006#id", "label":"2006"},

...]}

http://OpenGovIntelligence.eu

@OpenGovInt

https://github.com/OpenGovIntelligence/json-qb

Page 36: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 36

Performance is an issue in

Web-based applications

Large datasets

Federated queries

Other Challenges

Page 37: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 37

The Vision of Linked Data Cube Analytics

Software Tools

Challenges

Conclusion

Table of Contents

Page 38: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 38

Statistical data are fragmented.

Linked data provide the technological foundation to integrate statistical data and thus enable performing analytics on top of multiple datasets.

Numerous software tools have been developed towards this end.

Different publishing practices hamper the reuse of the tools across datasets and portals.

Technical complexity of Linked Data do not allow the wide exploitation of Linked Statistical Data.

We need to further work towards addressing these challenges.

Conclusion

Page 39: Linked Statistical Data: Challenges & Tools · R e s ul t s Di scov er & Pr e-pr ocess R aw Data D e f i ne S t r uc t ur e & C r e a t e C ub e Pub l i s h C ub e Ide ntify C ompatibl

Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 39

Thank you for your attention!!

http://kalampok.is

@kalampokis