expanding linked data cubes
TRANSCRIPT
Data Cube Vocabulary Workshop26th May 2015, Luxembourg
Expanding Linked Data CubesEvangelos Kalampokis
Eurostat Workshop 2
We assume that a cube can be expanded by increasing the size of one of the sets that define a cube.
Therefore, cube x can be expanded by adding one or more elements into: (a) the set of measures, (b) the set of objects of an attribute of a dimension, (c) the set of attributes of a dimension, or (d) the set of dimensions.
The prerequisite for this process is that cube y has some characteristics that allow for merging with cube x.
We say that if cube y have these characteristics, then “y is compatible to expand x”
These characteristics, however, are different in each of the four cases that can be followed to expand x
26th May 2015
Expanding an RDF data cube
319-20 May 2014 OpenCube plenary meeting
LATC’s Eurostat Dataset Dataset’s characteristics
5,000 linked data cubes Size ranges from 94 KB to 22 GB Median size 3 MB Average size 118 MB
We want to identify how many of these cubes are compatible to merge.
SPARQL endpoint does not provide access to observations only to Data Structure Definition
Difficult to identify number of compatible cubes through a small number of SPARQL queries.
OpenCube plenary meeting 4
We introduce a Cluster of cubes as a set of cubes having exactly the same dimensions (URIs or code lists are identical).
Process: Step 1: Access the Eurostat Data Structure Definitions through the SPARQL endpoint
to identify clusters of cubes with the same dimensions Step 2: Analyze the identified clusters. We select 10 clusters to study
Step 2.1: Import their dump files into an RDF store Step 2.2: Calculate the overlap of each dimension type Step 2.3: Calculate the overlap of each measure measure in the cluster.
Overlap for a dimension D between 2 cubes is defined as: |Vkm Vjm| / |Vkm Vjm| where Vkm are the values of the dimension m for the cube k
19-20 May 2014
Eurostat Cluster Identification
OpenCube plenary meeting 519-20 May 2014
Clusters of cubes with same dimensions
Eurostat Workshop 6
Cluster Size Number of compatible cubes for adding measure
Number of compatible cubes for adding value to dimension
1 69 1780 45
2 45 433 38
3 34 35 13
4 15 20 31
5 37 278 13
6 20 14 27
7 30 13 150
8 21 364 0
9 31 76 0
10 16 0 146
26th May 2015
Compatible cubes in LATC’s Eurostat
Eurostat scenario 1
Cluster of compatible Data Cubes
Energy and electricity consumption in other sectors and households
Energy and electricity consumption in the industrial sector
Energy and electricity consumption in the transport sector
Electricity production
Fuel consumption
Compatible to “Add value to dimension”
Energy Indicator
Eurostat scenario 2Cluster of compatible Data Cubes
Death due to transport accidents, by sex - 2011
Death due to chronic liver disease, by sex - 2011
Death due to suicide, by sex - 2011
Death due to diseases of the nervous system, by sex - 2011
Death due to AIDS (HIV-disease), by sex - 2011
Death due to accidents, by sex - 2011
Compatible to “Add value to dimension”
International Statistical Classification of Diseases and Related Health Problems
Eurostat scenario 3
Cluster of compatible Data Cubes
Broadband and connectivity - persons employed
Digital single market - promoting e-commerce for businesses
Employees - level of internet access (NACE Rev. 2 activity)Enterprises purchasing via internet and/or networks other than internet (NACE Rev. 2 activity
Compatible to “Add value to dimension”
Information society indicator
Eurostat Scenario 4Main economic variables (2006)Dimensions: timePeriod (2006) Geopolitical entity Economical indicator for structural
business statistics Classification of economic activities
Measures: ObsValue
Main economic variables (2007)Dimensions:timePeriod (2007)Geopolitical entityEconomical indicator for structural business statisticsClassification of economic activities
Measures: ObsValue
Compatible to “Add value to dimension”
timePeriod
Scenario 1 – Add value to dimension
Initial cube:Job seekers per Time - Sex - Age group - Ref. area (2007 – 2014)
Dimensions: Time(2007 – 2014) Sex Age group Reference area
Measures: Total amount
Cube to use for expansion:Job seekers per Time - Sex - Age group - Ref. area (1999 – 2014)
Dimensions: Time(1999 – 2014) Sex Age group Reference area
Measures: Total amount
Scenario 2 – Add measure
Initial cube:Social housing per Time – Ref. Area
Dimensions: Time Reference area
Measures: Total amount of apartments Total amount of duplex apartments Total amount of unknown residence
type Total amount of family houses
Cube1 to use for expansionBuilding permits per Time – Ref. Area
Dimensions:
Time Reference area
Measures: Total amount
Cube2 to use for expansionJob seekers per Time – Ref. Area
Dimensions:– Time– Reference area
Measures:– Total amount
Scenario 3 – Add measure
Initial cubePref. scheme in health insurance per Sex – Time – Ref. Area
Dimensions: Time Sex Reference area
Measures: Total amount in scheme A Total amount in scheme B Total amount in scheme C Total amount in scheme D Total amount in scheme E
Cube1 to use for expansionJob seekers per Sex - Time - Ref.Area (1999-2014)
Dimensions: Time Sex Reference area
Measures: Total amount
Cube2 to use for expansionJob seekers per Sex - Time - Ref.Area (2007-2014)
Dimensions:– Time– Sex– Reference area
Measures:– Total amount
Eurostat Workshop 14
http://195.251.218.39:8888/resource/selection
26th May 2015
Demo