linked statistical data: challenges & tools · r e s ul t s di scov er & pr e-pr ocess r aw...
TRANSCRIPT
ESS Workshop on dissemination of official statistics as open data
18-19 January 2017, Malta
Linked Statistical Data: Challenges & ToolsDr. Evangelos Kalampokis
CERTH-ITI and University of Macedonia, Greece
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 2
The Vision of Linked Data Cube Analytics
Software Tools
Challenges
Conclusion
Table of Contents
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 3
The Vision of Linked Data Cube Analytics
Software Tools
Challenges
Conclusion
Table of Contents
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 4
Open Statistical Data are fragmented
Searching data.gov.uk for “unemployment” datasets:
122 results (links and files)
These results provide access to 56 files and 610 links
These links lead to 18 other portals
Through them to more than 2000 other files
Open Statistical Data – Fragmented Views
E. Kalampokis, E. Tambouris, A. Karamanou, K. Tarabanis (2016) Open Statistics: The Rise of a new Era for Open Data?, EGOV2016, LNCS 9820, pp.31-43, Springer.
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 5
All these web portals provide complementary views of the unemployment data.
For example, focusing on geo dimension: Data about unemployment in
different administrative levels in the UK.
ONS, NOMIS, NeSS and Open Data Communities provide data about the whole country.
Local government portals provide data for specific areas (e.g. Warwickshire, Cambridgeshire)
Open Statistical Data – Complementary ViewsLevel 0 UK ONS
Level 1 Countries ONS
Level 2 Regions ONS, NOMIS, NeSS,
Level 3 Counties NOMIS
Level 4 Districts/Boroughs/Divisions ODC
Level 5 Local Enterprise Parttnership ONS, NOMIS
Level 6 Local Authorities/Communities
First Areas
ONS, NOMIS, NeSS
Level 7 Parliamentary Constituencies ONS, NOMIS
Level 8 Wards Warkwickshire,
Cambridgeshire
Level 9 Market Towns Cambridgeshire
Level 10 Super Output Area Warkwickshire
Level 11 Super Output Area Middle Layer NeSS
Level 12 Super Output Area Lower Layer NeSS
Level 13 Output Area NeSS
Level 14 Parishes Cambridgeshire
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 6
Linked Statistical Data (aka linked data cubes) have the potential to solve data interoperability, facilitate data integration and thus provide unified access to multiple datasets across the Web.
Linked Statistical Data
http://lod-cloud.net
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 7
Performing analytics on top of multiple datasets.
Two categories of scenarios: Easy performance of typical data
analytics scenarios
Realization of innovative data analytics scenarios
A type of systems that enables users to: Discover data that make sense to
integrate.
Combine data from multiple sources.
Perform various types of analysis on top of integrated data.
The Vision of Linked Data Cube Analytics
E. Kalampokis, E. Tambouris, K. Tarabanis (2016) Linked Open Cube Analytics Systems: Potential and Challenges IEEE Intelligent Systems, Vol. 31, No.5, pp.89-92
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 8
The Vision of Linked Data Cube Analytics
Software Tools
Challenges
Conclusion
Table of Contents
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 9
Software tools for publishing, integrating and exploiting Linked Open Statistical Data.
Software tools
Metadata
Expand Cube
Discover & Explore
Cube
Analyse Cube
Communicate
ResultsDiscover & Pre-process
Raw Data
Define Structure &
Create Cube
Publish Cube
Identify Compatible
Cubes
Processed raw data
Create
Expand
Exploit
E. Tambouris, E. Kalampokis, K. Tarabanis (2015) Processing Linked Open Data Cubes, EGOV2015, LNCS 9248, pp.130-143, Springer.
A. Karamanou, E. Kalampokis, E. Tambouris, K. Tarabanis (2016) Linked data cubes: Research results so far, SemStats2016, 17-21 October 2016, Kobe, Japan, CEUR-WS
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 10
Publishing Tools Overview
Technical formats of raw data (CSV, RDBMS, JSON, OLAP etc.)
User interface (GUI, CLI)
Structure of the outcome cube (e.g. pre-defined)
Evaluation (performance, ease of use)
Programming languages and environments
Α. Karamanou, E. Kalampokis, E. Tambouris, K. Tarabanis (0000) “Understanding the Use of Linked Open Data in Statistics” Journal of Web Semantics [under review]
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 11
Multiple technical formats - The OpenCube Toolkit
http://opencube-toolkit.eu
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 12
LOD2 Statistical Workbench - CSV2DataCube
R. P. E. Salas, M. Martin, F. M. Da Mota, S. Auer, K. Breitman, M. A.Casanova, Publishing statistical data on the web, in: Semantic Computing(ICSC), 2012 IEEE Sixth International Conference on, IEEE, 2012,pp. 285–292.
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 13
QBer
R. Hoekstra, A. Merono-Penuela, K. Dentler, A. Rijpma, R. Zijdeman, I. Zandhuis, An ecosystem for linked humanities data, in: Proceedings1060 of the 1st Workshop on Humanities in the Semantic Web (WHiSe 2016), ESWC, 2016.
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 14
Open source software framework for transforming tabular data (CSV or XLS) to RDF
Programming skills required (DSL to specify transformation pipelines)
Performs well with large datasets
Faster transformation
Command Line Interface - Grafter
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 15
Types of data analysis (e.g. graphs, maps, OLAP, statistical analyses etc.)
Programming languages and development platforms
Tools, platforms, web applications etc.
Domain specific tools (tourism, health etc.)
Evaluation (performance, user friendliness)
Structure of the cube
Exploiting Software Tools
Α. Karamanou, E. Kalampokis, E. Tambouris, K. Tarabanis (0000) “Understanding the Use of Linked Open Data in Statistics” Journal of Web Semantics [under review]
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 16
Exploiting Software Tools – Type of analysis
Visualisation55%Statistical
Analysis24%
Browsing13%
OLAP8%
A. Karamanou, E. Kalampokis, E. Tambouris, K. Tarabanis (2016) Linked data cubes: Research results so far, SemStats2016, 17-21 October 2016, Kobe, Japan, CEUR-WS
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 17
OpenCube MapView
http://opencube-toolkit.eu
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 18
A web application that enables querying, refining, and visualisinglinked data.
Several of the available datasets are Linked Statistical Data
CODE Linked Data Query Wizard
http://code.know-center.tugraz.at
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 19
CODE Linked Data Query Wizard (Browser)
http://code.know-center.tugraz.at
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 20
StatSpace Explorer
http://statspace.linkedwidgets.org/explorer
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 21
StatSpace Explorer - Visualisation
http://statspace.linkedwidgets.org/explorer
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 22
Discover Linked Data Cubes that makes sense to join.
Discover Linked Data Cubes that are structural compatible to join.
Integrate structural compatible cubes.
Integrating Software Tools
E. Kalampokis, E. Tambouris, A. Karamanou, K. Tarabanis (2016) Open Statistics: The Rise of a new Era for Open Data?, EGOV2016, LNCS 9820, pp.31-43, Springer.
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 23
An OLAP Browser that can perform OLAP operations on top of multiple datasets that reside in different portals
OpenCube OLAP Browser
http://opencube-toolkit.eu
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 24
OpenCube OLAP Browser
The OLAP Browser enables performing OLAP operations on data from:
Flemish Government
Scottish Government
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 25
OpenCube OLAP Browser
OLAP operations on top of integrated view of multiple statistical datasets.
E. Kalampokis, E. Tambouris, D. Zeginis, K. Tarabanis “Expanding Data Cubes for Enhanced OLAP Analytics on the Web of Linked Data” IEEE Transactions on Knowledge and Data Engineering [Under Review]
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 26
Compatibility Explorer & Expander
OpenCube OLAP Browser
Given a cube in the local
store, the Compatibility
Explorer
(a) Searches into the
Linked Data Web and
identifies cubes that
are compatible to
expand the initial cube
and
(b) Establishes typed links
between the local and
the compatible cubes
The Expander
creates a new
expanded cube by
merging two
compatible ones.
The Expander
implements the
theoretical
framework
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 27
StatSpace Explorer – Identify Relatable datasets
http://statspace.linkedwidgets.org/explorer
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 28
StatSpace Explorer– Compare Relatable datasets
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 29
The Vision of Linked Data Cube Analytics
Software Tools
Challenges
Conclusion
Table of Contents
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 30
Linked Data use a set of standards (RDF, HTTP, vocabularies etc.) to enable the unified access of data on the Web.
However, publishing Statistical Data using Linked Data principles may result to Linked Statistical Data silos.
Software tools cannot be reused across datasets
Creation of system silos
Challenges - Data Silos
http://www.flickr.com/photos/rachelrusinski/526260022
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 31
A set of publishing practices for Linked Data Cubes.
In the first steps more than 10 academics and practitioners have been involved.
The results will be publicly available for further discussion.
Application Profile for Linked Data Cubes
http://OpenGovIntelligence.eu
@OpenGovInt
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 32
Application Profile for Linked Data Cubes (example)
http://OpenGovIntelligence.eu
@OpenGovInt
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 33
Developers, data scientists are not familiar with RDF, SPARQL etc.
We can hide the complexity from the end users
We can use Linked Data at the back end to integrate and semantically enrich statistical data.
Challenges - Technical complexity of Linked Data technologies
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 34
This API is designed to support developers to use Linked Statistical Data while assuming minimal knowledge of Linked Data.
Specification and implementation of a REST API that translates RDF data cubes to JSON data
Numerous existing software tools and libraries can be used to exploit Linked Statistical Data
OpenGovIntelligence JSON-QB API
http://OpenGovIntelligence.eu
@OpenGovInt
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 35
GET /dimensionsParameters: dataset (required)
Sample result:
GET /dimension-valuesParameters: dataset (required), dimension (required)Sample result:
JSON-QB API - metadata
[{"@id":"http://purl.org/linked-data/sdmx/2009/dimension#sex","label":"sex"},{"@id":"http://example.com#timePeriod","label":"Time Period"},{"@id":"http://example.com#refArea","label":"Reference Area”}]
{"dimension":{"URI":"http://example.com#timePeriod,"label":"Time Period”}"values":[
{"@id":"http://example.com/concept/year2004#id", "label":"2004"},{"@id":"http://example.com/concept/year2005#id", "label":"2005"},{"@id":"http://example.com/concept/year2006#id", "label":"2006"},
...]}
http://OpenGovIntelligence.eu
@OpenGovInt
https://github.com/OpenGovIntelligence/json-qb
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 36
Performance is an issue in
Web-based applications
Large datasets
Federated queries
Other Challenges
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 37
The Vision of Linked Data Cube Analytics
Software Tools
Challenges
Conclusion
Table of Contents
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 38
Statistical data are fragmented.
Linked data provide the technological foundation to integrate statistical data and thus enable performing analytics on top of multiple datasets.
Numerous software tools have been developed towards this end.
Different publishing practices hamper the reuse of the tools across datasets and portals.
Technical complexity of Linked Data do not allow the wide exploitation of Linked Statistical Data.
We need to further work towards addressing these challenges.
Conclusion
Evangelos Kalampokis ESS (Linked) Open Data Workshop, 18-19 January 2017, Malta 39
Thank you for your attention!!
http://kalampok.is
@kalampokis