the opencube toolkit - webinar 2

38
Andriy Nikolov, fluid Operations AG, Germany 2 nd OpenCube Webinar 15 September 2015 The OpenCube Toolkit: Overview

Upload: opencubeproject

Post on 24-Jan-2018

728 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Andriy Nikolov, fluid Operations AG, Germany

2nd OpenCube Webinar

15 September 2015

The OpenCube Toolkit:Overview

The OpenCube Project: Overview

The OpenCube Toolkit

Base platform

Components for processing statistical data

Conclusions

2

Table of Contents

2nd OpenCube Webinar

Data Cube

Statistical data is often organized as data cubes, where each cell contains a measure described based on a number of dimensions.

OLAP Operations: drill up/down, slicing, dicing, pivot etc. Data cubes essential for Business Intelligence

Dimensions Hierarchy

Measure

2nd OpenCube Webinar 3

Linked Data has the potential to enable combining and performinganalytics on top of disparate and previously isolated statistical data

The RDF Data Cube Vocabulary has been proposed for modellingmulti-dimensional data as RDF graphs.

However, tools for handling linked data cubes:

are only few and scattered

have not been tested under real-life conditions

4

Linked Data

Potential of using LOD in statistical data analysis unexploited

2nd OpenCube Webinar

5

The OpenCube project

OpenCube is a 2-year project funded by the EU within FP7

The project aims to develop and test processes and tools for managing statistical

linked open data.

The results will:

Facilitate data publishers to create linked data cubes from legacy formats

Empower data users to browse, visualise, link, expand and analyse data cubes.

Enable analysis not possible before (merging data cubes at a Web scale)

2nd OpenCube Webinar

We propose a lifecycle for statistical LD

The lifecycle is divided into three phases: create, expand and exploit (or consume)

The lifecycle prescribes the steps that raw data cubes* should go through in order to create value.

OpenCube also develops tools to support the whole lifecycle of linked statistical data.

Linked Statistical Data Lifecycle

6

E. Tambouris, E. Kalampokis, K. Tarabanis (2015) Processing Linked Open Data Cubes, Electronic GovernmentVolume 9248 of the series Lecture Notes in Computer Science pp 130-143.

* We assume statistical data is organized as data cubes, where each cellcontains a measure described based on a number of dimensions.

2nd OpenCube Webinar

For more information http://opencube-project.eu http://opencube-toolkit.eu

Check out our free webinars!! 1st webinar: Project overview & OLAP

browser: Slides:

http://www.slideshare.net/OpenCubeProject/opencube-project-webinar-1-sept-8-2015

Video: https://vimeo.com/138860345

Project coordinators: Konstantinos Tarabanis, [email protected] Themis Tambouris, [email protected]

7

More on OpenCube…

OpenCube consortium

2nd OpenCube Webinar

The OpenCube Project: Overview

The OpenCube Toolkit

Base platform

Components for processing statistical data

Conclusions

8

Table of Contents

2nd OpenCube Webinar

Creating components TARQL extension

D2RQ extension

JSON-stat

Grafter

R2RML-cube extension

(commercial offering only)

Expanding components OpenCube Expander

OpenCube Linker

Exploiting components Data catalogue solution

OpenCube Browser

OpenCube MapView

R Analytics Integration

9

OpenCube Toolkit

Developed using the open source Information Workbenchas underlying linked data management platform

License scheme OpenCube components are

provided under open source licenses

Check http://opencube-toolkit.eu

But, commercial solutions are also offered by consortium members

2nd OpenCube Webinar

2nd OpenCube Webinar 10

Base platform: Information Workbench

Platform for development of linked data applications

Semantic Web Data

Semantics- & Linked Data-based

Integration of Enterprise and Open

Data Sources

Intelligent Data Access and

Analytics

• Visual exploration

• Semantic search

• Dashboarding and reporting

Collaboration and Knowledge

Management Platform

• Wiki-based curation &

authoring of data

• Collaborative workflows

Source: http://www.fluidops.com/information-workbench/

2nd OpenCube Webinar 11

Platform Architecture

Data storage and management platform

Reusable UI and data integration components

Customized application solutions

External resources to reuse data and create mashups

Template: …

Ontology as a “Structural Backbone”

Resource page

RDF DataGraph

Ontology(RDFS/OWL)

#BarackObama#WhiteHouse

foaf:Person

vcard:Address

rdf:typerdf:type

Template:vcard:Address

UI templates

Template:foaf:Person

Resource page

Defining data

structure

Defining UI structure

2nd OpenCube Webinar 12

• Open Source, written in Java

• Layered architecture for semantic data management

• Easy to plug in new data management components on demand

• Most of the existing triple stores support Sesame API

Sesame Access API

SAIL API

Stable (yet extensilble) APIs for data access, manipulation, ...

SAIL 1 (e.g. Query Optimization

Layer)

SAIL 2 (e.g. Distributed Query

Execution Layer)

DB1 DB2 DB3

Stackable architecure of custom data management components

Easy integration by implementing a generic API

Data Storage & Access

Data Management based on Sesame framework

2nd OpenCube Webinar 13

Data Integration: Data Provider Concept

Data providers support the periodic extraction & integration from external data sources into a central repository

• Lifting from arbitrary data formats to RDF (e.g., relational, XML, CSV)

• Parametrizable (e.g. connection information, refresh interval, ..)

• Built-in UI for instantiating providers

• Intuitive interfaces and APIs for writing own, custom providers

Connect to data source

Convert data into RDF

Extract data from source

ScriptProvider

SOAP ProviderR2RML

XML2RDF

REST Provider

Examples:

Store RDF in repository

2nd OpenCube Webinar 14

Data source concept

2nd OpenCube Webinar 15

Data integration

Data Source

• Low-level data access

Mapper

• Translation into triples

•Extract and manipulate data

Post Processor

(optional)

•Reconciliation (merging)

• Improve data quality

2nd OpenCube Webinar 16

User Interface

User Interface: One Page per URI

Resource page

RDF

Graph

Resource page

Resource page

Resource page

172nd OpenCube Webinar

Wiki Concept

• Resource view is defined using the wiki-based UI

• Go to a new wiki page…/resource/Widget123Page

• Change to the Edit View

182nd OpenCube Webinar

Analytics and ReportingVisualization and Exploration

Mashups with Social MediaAuthoring and Content Creation

Widgets are not static and can be integrated into the UI using a

Wiki-style syntax.

Configurable Widgets

2nd OpenCube Webinar 19

Page content is composed based on a template concept:

Barack Obama

rdf:type

• Wiki template Template:foaf:Person• Table view config for foaf:Person• Graph view config for foaf:Person• Pivot view config for foaf:Person• Additional widget definitions for foaf:Personrequest for

dbpedia:Barack_Obama

foaf:Person

Resource page

• Wiki view for dbpedia:Barack_Obama• Table view for dbpedia:Barack_Obama• Graph view for dbpedia:Barack_Obama• Pivot view for dbpedia:Barack_Obama• Additional widget definitions for

dbpedia:Barack_Obama

+

Combined information from template definition and specific instance (giving instance config a priority)

Instance Pages vs. Templates

2nd OpenCube Webinar 20

Download open-source Information Worbench Community Edition

http://www.fluidops.com/en/company/training/open_source

Detailed documentation

http://help.fluidops.com

2nd OpenCube Webinar 21

More information

The OpenCube Project: Overview

The OpenCube Toolkit

Base platform

Components for processing statistical data Creating linked data cubes

Exploiting statistical data

Conclusions

22

Table of Contents

2nd OpenCube Webinar

We propose a lifecycle for statistical LD

The lifecycle is divided into three phases: create, expand and exploit (or consume)

The lifecycle prescribes the steps that raw data cubes* should go through in order to create value.

OpenCube also develops tools to support the whole lifecycle of linked statistical data.

Linked Statistical Data Lifecycle

23

E. Tambouris, E. Kalampokis, K. Tarabanis (2015) Processing Linked Open Data Cubes, Electronic GovernmentVolume 9248 of the series Lecture Notes in Computer Science pp 130-143.

* We assume statistical data is organized as data cubes, where each cellcontains a measure described based on a number of dimensions.

2nd OpenCube Webinar

24

Data Creation Components

2nd OpenCube Webinar

Implemented as custom data providers

2nd OpenCube Webinar 25

Data Creation Components

The OpenCube Project: Overview

The OpenCube Toolkit

Base platform

Components for processing statistical data Creating linked data cubes

Exploiting statistical data

Conclusions

26

Table of Contents

2nd OpenCube Webinar

Managing metadata catalogues

Allows the user to search for specific datasets by keyword/category/catalogue

explore pre-defined relations between datasets within the catalogue

explore the available metadata descriptions of datasets (dataset structure)

Data Catalogue Management Solution

272nd OpenCube Webinar

28

Exploring data: OpenCube browserSummarize observations

across a dimension

(dimension reduction)

Change the axes

of the table

Change the

language

Change the fixed

values

It enables the exploration of an RDF data cube by presenting a two-dimensional slice of the cube as a table.

The slice is created by setting a fixed valuesfor each dimensionthat is not presented in the table.

2nd OpenCube Webinar

See our first webinar: http://www.slideshare.net/OpenCubeProject/opencube-project-webinar-1-sept-8-2015

Visualizes RDF data cubes on a map

Allows selecting the cube, dimensions, and measuresto display in an interactiveway

Supports: Markers

Bubble

Choropleth maps

29

Exploring data: OpenCube MapView

2nd OpenCube Webinar

Enables advanced data analysis tasks using the well-established R software

2nd OpenCube Webinar 30

Analyzing data with R

Passing input data retrieved from an RDF triple store

using SPARQL

Reusing the analysis results for visualization or

integration with the original data

2nd OpenCube Webinar 31

R Analysis Tasks

Analysis task is editedusing a web UI form

2 types of inputparameters: Constants

interpreted as variables ofbasic types in R

SPARQL query results interpeted as data frames

in R

Script executed on the R server, and the resultsare passed back to theOpenCube Toolkit

Making use of the results Visualize

Store as linked data

Visualisation of analysis results as a table

as a static chart built in R

as an interactive stock chart

Reuse of analysis results: preserving R output aslinked data Use R output as a tabular data source to import data and

convert with R2RML

32

Analyzing data with R

2nd OpenCube Webinar

OpenCube public demo

An instance of the developed platform hosted by fluidOps.

Contains metadata and a set of cubes from Eurostat.

Illustrates the data catalogue functionalities and data analysis using R.

http://data.fluidops.net

The Flemish Government An instance of the developed

platform have been deployed at the premises of the Flemish government.

Flemish government had already opened up statistics by means of linked data cubes.

11 cubes had been transformed to linked data according to the QB vocabulary and stored in a Virtuoso RDF store.

Demos

332nd OpenCube Webinar

The OpenCube Project: Overview

The OpenCube Toolkit

Base platform

Components for processing statistical data

Conclusions

34

Table of Contents

2nd OpenCube Webinar

OpenCube project develops processes and tools for statistical data management

OpenCube Toolkit provides:

A platform for building customized applications with linked datacubes

A range of software components for: Tools for creating linked open statistical data

Tools for expanding open statistical data

Tools for exploiting linked open statistical data

35

Conclusions

2nd OpenCube Webinar

For more information http://opencube-project.eu http://opencube-toolkit.eu

Check out our free webinars!! 1st webinar: Project overview & OLAP

browser: Slides:

http://www.slideshare.net/OpenCubeProject/opencube-project-webinar-1-sept-8-2015

Video: https://vimeo.com/138860345

Project coordinators: Konstantinos Tarabanis, [email protected] Themis Tambouris, [email protected]

36

More on OpenCube…

OpenCube consortium

2nd OpenCube Webinar

The work presented in the paper is partially funded by

37

Acknowledgments

http://opencube-project.eu

@OpenCubeProject

2nd OpenCube Webinar

PublishMyData for publishing governmental statistical data

Tuesday, September 22 at 06:00 PM CEST

http://opencube.enterthemeeting.com/m/VCAJFCJW

38

Next webinar

2nd OpenCube Webinar