expanding linked data cubes

14
Data Cube Vocabulary Workshop 26 th May 2015, Luxembourg Expanding Linked Data Cubes Evangelos Kalampokis

Upload: opencubeproject

Post on 03-Aug-2015

46 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Expanding Linked Data Cubes

Data Cube Vocabulary Workshop26th May 2015, Luxembourg

Expanding Linked Data CubesEvangelos Kalampokis

Page 2: Expanding Linked Data Cubes

Eurostat Workshop 2

We assume that a cube can be expanded by increasing the size of one of the sets that define a cube.

Therefore, cube x can be expanded by adding one or more elements into: (a) the set of measures, (b) the set of objects of an attribute of a dimension, (c) the set of attributes of a dimension, or (d) the set of dimensions.

The prerequisite for this process is that cube y has some characteristics that allow for merging with cube x.

We say that if cube y have these characteristics, then “y is compatible to expand x”

These characteristics, however, are different in each of the four cases that can be followed to expand x

26th May 2015

Expanding an RDF data cube

Page 3: Expanding Linked Data Cubes

319-20 May 2014 OpenCube plenary meeting

LATC’s Eurostat Dataset Dataset’s characteristics

5,000 linked data cubes Size ranges from 94 KB to 22 GB Median size 3 MB Average size 118 MB

We want to identify how many of these cubes are compatible to merge.

SPARQL endpoint does not provide access to observations only to Data Structure Definition

Difficult to identify number of compatible cubes through a small number of SPARQL queries.

Page 4: Expanding Linked Data Cubes

OpenCube plenary meeting 4

We introduce a Cluster of cubes as a set of cubes having exactly the same dimensions (URIs or code lists are identical).

Process: Step 1: Access the Eurostat Data Structure Definitions through the SPARQL endpoint

to identify clusters of cubes with the same dimensions Step 2: Analyze the identified clusters. We select 10 clusters to study

Step 2.1: Import their dump files into an RDF store Step 2.2: Calculate the overlap of each dimension type Step 2.3: Calculate the overlap of each measure measure in the cluster.

Overlap for a dimension D between 2 cubes is defined as: |Vkm Vjm| / |Vkm Vjm| where Vkm are the values of the dimension m for the cube k

19-20 May 2014

Eurostat Cluster Identification

Page 5: Expanding Linked Data Cubes

OpenCube plenary meeting 519-20 May 2014

Clusters of cubes with same dimensions

Page 6: Expanding Linked Data Cubes

Eurostat Workshop 6

Cluster Size Number of compatible cubes for adding measure

Number of compatible cubes for adding value to dimension

1 69 1780 45

2 45 433 38

3 34 35 13

4 15 20 31

5 37 278 13

6 20 14 27

7 30 13 150

8 21 364 0

9 31 76 0

10 16 0 146

26th May 2015

Compatible cubes in LATC’s Eurostat

Page 7: Expanding Linked Data Cubes

Eurostat scenario 1

Cluster of compatible Data Cubes

Energy and electricity consumption in other sectors and households

Energy and electricity consumption in the industrial sector

Energy and electricity consumption in the transport sector

Electricity production

Fuel consumption

Compatible to “Add value to dimension”

Energy Indicator

Page 8: Expanding Linked Data Cubes

Eurostat scenario 2Cluster of compatible Data Cubes

Death due to transport accidents, by sex - 2011

Death due to chronic liver disease, by sex - 2011

Death due to suicide, by sex - 2011

Death due to diseases of the nervous system, by sex - 2011

Death due to AIDS (HIV-disease), by sex - 2011

Death due to accidents, by sex - 2011

Compatible to “Add value to dimension”

International Statistical Classification of Diseases and Related Health Problems

Page 9: Expanding Linked Data Cubes

Eurostat scenario 3

Cluster of compatible Data Cubes

Broadband and connectivity - persons employed

Digital single market - promoting e-commerce for businesses

Employees - level of internet access (NACE Rev. 2 activity)Enterprises purchasing via internet and/or networks other than internet (NACE Rev. 2 activity

Compatible to “Add value to dimension”

Information society indicator

Page 10: Expanding Linked Data Cubes

Eurostat Scenario 4Main economic variables (2006)Dimensions: timePeriod (2006) Geopolitical entity Economical indicator for structural

business statistics Classification of economic activities

Measures: ObsValue

Main economic variables (2007)Dimensions:timePeriod (2007)Geopolitical entityEconomical indicator for structural business statisticsClassification of economic activities

Measures: ObsValue

Compatible to “Add value to dimension”

timePeriod

Page 11: Expanding Linked Data Cubes

Scenario 1 – Add value to dimension

Initial cube:Job seekers per Time - Sex - Age group - Ref. area (2007 – 2014)

Dimensions: Time(2007 – 2014) Sex Age group Reference area

Measures: Total amount

Cube to use for expansion:Job seekers per Time - Sex - Age group - Ref. area (1999 – 2014)

Dimensions: Time(1999 – 2014) Sex Age group Reference area

Measures: Total amount

Page 12: Expanding Linked Data Cubes

Scenario 2 – Add measure

Initial cube:Social housing per Time – Ref. Area

Dimensions: Time Reference area

Measures: Total amount of apartments Total amount of duplex apartments Total amount of unknown residence

type Total amount of family houses

Cube1 to use for expansionBuilding permits per Time – Ref. Area

Dimensions:

Time Reference area

Measures: Total amount

Cube2 to use for expansionJob seekers per Time – Ref. Area

Dimensions:– Time– Reference area

Measures:– Total amount

Page 13: Expanding Linked Data Cubes

Scenario 3 – Add measure

Initial cubePref. scheme in health insurance per Sex – Time – Ref. Area

Dimensions: Time Sex Reference area

Measures: Total amount in scheme A Total amount in scheme B Total amount in scheme C Total amount in scheme D Total amount in scheme E

Cube1 to use for expansionJob seekers per Sex - Time - Ref.Area (1999-2014)

Dimensions: Time Sex Reference area

Measures: Total amount

Cube2 to use for expansionJob seekers per Sex - Time - Ref.Area (2007-2014)

Dimensions:– Time– Sex– Reference area

Measures:– Total amount

Page 14: Expanding Linked Data Cubes

Eurostat Workshop 14

http://195.251.218.39:8888/resource/selection

26th May 2015

Demo