a browser for linked data cubes
TRANSCRIPT
Data Cube Vocabulary Workshop26th May 2015, Luxembourg
Developing a Browser for Linked Data CubesEvangelos Kalampokis
2Eurostat Workshop
The OpenCube Browser enables exploring an RDF Data Cube by presenting the values in a table.
It is currently based on the InformationWorkbench open source platform.
It requires as an input the URI of a cube.
The OpenCube Browser
26th May 2015
Eurostat Workshop 326th May 2015
The OpenCube Browser
It presents a 2-dimensitonal slice of an
RDF cube in a table
Eurostat Workshop 426th May 2015
The OpenCube Browser
Change the two axes of the table (in the case of cubes with more
than 2 dimensions)
Eurostat Workshop 526th May 2015
The OpenCube Browser
Change the fixed values of the dimensions that are not
included in the table
Eurostat Workshop 626th May 2015
The OpenCube Browser
Change the language of the values
Eurostat Workshop 7
The feedback received by the employees of the Flemish Government It should be understood in the context of a department considering
alternatives to an existing proprietary solution. Overall, users were satisfied. Main criticism by some users
Usefulness: “We don’t see added value compared to other tools” Easy of use
User interface is not clear to an average user Performance (response time) should be better
26th May 2015
Evaluation of the OpenCube Browser
Eurostat Workshop 8
The OpenCube OLAP Browser enhances previous version. The OpenCube OLAP Browser
New clear and user-friendly interface (similar to an OLAP browser) Enables browsing integrated multiple RDF cubes
26th May 2015
OpenCube OLAP Browser
http://83.212.122.81:8888/resource/OpenCubeOLAPBrowser
Eurostat Workshop 9
We start with an empty canvas User can add dimensions and measures
26th May 2015
Address easy of use related comments
Eurostat Workshop 10
Linked Data as a competitive advantage over existing tools. Enabling the integration of RDF data cubes across the Web. One can start with an initial RDF cube, identify compatible cubes
from other sources and browse the new expanded cube.
26th May 2015
Address usefulness related comments
Eurostat Workshop 11
The OpenCube OLAP browser is a proof of concept of this vision
26th May 2015
The vision of exploiting Expanded Linked Data Cubes
LATC’s Eurostat ExampleMain economic variables (2006)Dimensions: timePeriod (2006) Geopolitical entity Economical indicator for structural
business statistics Classification of economic activities
Measures: ObsValue
Main economic variables (2007)Dimensions:timePeriod (2007)Geopolitical entityEconomical indicator for structural business statisticsClassification of economic activities
Measures: ObsValue
Eurostat Workshop 1326th May 2015
Emphasizing on Cubes Integration
Discover compatible to join linked data cubes.
Establish typed links between compatible to join cubes
Create expanded cubes by increasing the size of one of the sets that define a cube
Eurostat Workshop 14
Some functionalities of the OpenCube OLAP Browser require the output of functionalities performed by other tools of the OpenCube Toolkit.
26th May 2015
Tools and Functionalities
OpenCube OLAP Browser
B1: Present 2D slice of a cube
B2: Add/remove Measures
B3: Support multiple languages
B4: Add/remove Dimensions
B5: Roll-up/drill-down
B6: Integrated view of cubes
Aggregator
A1: Compute aggregations across dimension
A2: Compute aggregations across hierarchy
requires Compatibility Explorer
CE1: Identify compatible cubes
Eurostat Workshop 1526th May 2015
B1: Present a two-dimensional slice of a cube
Eurostat Workshop 1626th May 2015
B2: Add/remove measures
Adding one more measure to be presented
Eurostat Workshop 17
Change the language of the available data
26th May 2015
B3: Support multiple languages
Eurostat Workshop 1826th May 2015
B4: Add/remove dimensions
Adding one more dimension to be presented
Eurostat Workshop 19
Functionality B4 requires the use of the OpenCube Aggregator tool in order to pre-compute aggregations across all dimensions.
The Aggregator creates 2n-1 sub-cubes from a cube of n dimensions. We define this set of cubes as an Aggregation Set.
26th May 2015
A1: Compute aggregations across dimension
Time
GeoSex
Time Time
Geo Sex
Geo
Sex
Time GeoSex
Total
Three dimensions
Two dimensions
One dimension
No dimensions
Eurostat Workshop 20
If a user adds/removes one dimension the OpenCube OLAP browser picks up and presents a new cube from the aggregation set based on the selected dimensions.
In order to improve performance we establish typed links between the pre-computed cubes and the Aggregation Set
26th May 2015
B4: Add/remove dimension
Eurostat Workshop 21
Select the aggregation function This can be done based on the unit of measure
Meaningless aggregations Users intervention is probably required
26th May 2015
Challenges related to the Aggregator
Eurostat Workshop 22
This functionality is under development It requires pre-computing aggregations across a hierarchy using the
OpenCube Aggregator tool
26th May 2015
B5: Roll-up/Drill-down
Under Development
Eurostat Workshop 23
It enriches an existing cube with new observations by using a hierarchy.
26th May 2015
A2: Compute aggregations across hierarchy
Time
Geo
Sex
city1
city2
city3
+city4
country1
region1
region2
city1
city2
city3
city4Time
Geo
city1city2city3city4=
region1
region2country
1 Sex
24Eurostat Workshop
The user selects a cube and an operation
Add new measure Add new value to dimension
The tool presents all the available compatible cubes.
The user selects one of the cubes.
B6: Integrated view of multiple cubes (1/2)
26th May 2015
Eurostat Workshop 25
The OpenCube OLAP Browser presents an integrated view of the two RDF data cubes.
26th May 2015
B6: Integrated view of multiple cubes (2/2)
Added values
Eurostat Workshop 26
The OpenCube Compatibility Explorer pre-identifies and establishes typed links between compatible to merge cubes.
Two types of compatibility Add new measure compatible Add new value to dimension compatible
26th May 2015
E1: Identify compatible cubes
Eurostat Workshop 27
Identification of same dimension If the dimensions use codelists, the code list URI is used to determine
equality. If no code lists exist, the dimension URI is used to determine equality
Identification of equal measures Two measures are considered equal if they have the same URI
The measure obsValue does not explain what is actually measured at the cube so equality cannot be determined
Identify and make available the key reference datasets that connect different statistical datasets – e.g. concerning geography, physical assets and areas of government policy.
26th May 2015
Challenges related to compatibility (1/2)
Eurostat Workshop 28
The approach of expanding data cubes requires small cubes i.e. cubes that describe few measures.
…but how a cube can be modelled? For example
LATC’s Eurostat: more than 5000 cubes with few measures per cube Irish Census 2011: 682 cubes with one measure per cube Digital Agenda: Only 2 cubes with more than 100 measures per cube.
We need common understanding on how to conceptually model a Cube.
26th May 2015
Challenges related to compatibility (2/2)
Data level Operations per Functionality
29
Browser functionality Data level Operations
B1: Present 2D slice of a cube • Identify single cube measure (D1), multiple cube measures (D2)• Identify cube dimensions (D3)• Identify cube attributes (D4) • Identify dimension values (D5)
B2: Add/remove measures D3
B3: Multilinguality Data available at multiple languages (D6)
B4: Add/remove Dimensions D1, D2
B5: Roll-up/drill-down Definition of a hierarchy for hierarchical data (D7)
B6: Integrated view of cubes D1, D2, D3, D4, D5
Data level Operations per Functionality
30
Aggregator functionality Data level Operations
A1: Compute aggregations across dimension Identify the unit of cube’s single measure (D8), multiple measures (D9)
A2: Compute aggregations across hierarchy D7, D8, D9
Compatibility Explorer Functionality Data level Operations
CE1: Identify compatible cubes D1, D2, D3, D4, D5, D7
Eurostat Workshop 31
According to the QB vocabulary the qb:DataStructureDefinition must have a qb:MeasureProperty that defines the measure.
The Browser follows this approach However, other approach are used:
LATC’s Eurostat dataset defines sdmx-measure:obsValue as a qb:DimensionProperty
26th May 2015
D1: Identify single cube measure
Eurostat Workshop 32
The QB vocabulary offers two options: Multimeasure Observations
Define multiple qb:MeasureProperty one for each measure attached to qb:DataStructureDefinition
Use all qb:MeasureProperty at each observation
Measure Dimension Define multiple qb:MeasureProperty components one for each measure attached to
qb:DataStructureDefinition Define a special qb:DimensionProperty named qb:measureType At each observation use one qb:MeasureProperty. The dimension qb:measureType
defines the qb:MeasureProperty used at the specific observation
The browser follows the Multimeasure Observations approach
26th May 2015
D2: Identify multiple measures
Eurostat Workshop 33
The Open Data Communities use the Measure Dimension approach The Irish 2011 Census dataset defines only one measure per cube However other approaches are followed:
LATC’s Eurostat uses a qb:DimensionProperty to encode multiple measures The same holds for Digital Agenda
Other challenges: In multiple-measure observation, missing value in one measure will invalidate
the entire observation.
26th May 2015
D2: Identify multiple measures (Challenges)
Eurostat Workshop 34
The QB vocabulary defines that the qb:DataStructureDefinition must have a qb:DimensionProperty for every dimension.
The Browsers assumes that qb:DimensionProperty defines ONLY dimensions of a cube.
Other approaches: In LATC’s Eurostat dataset (a) attributes e.g. sdmx-dimension:freq ,
property:unit and (b) sdmx-measure:obsValue are declared as qb:DimensionProperty
Digital Agenda defines the breakdown dimension, which is actually a “super-dimension” in which one can add all the values of dimensions other than time and geography.
26th May 2015
D3: Identify cube dimensions
Eurostat Workshop 35
According to the QB vocabulary a qb:DimensionProperty is connected to a skos:ConceptScheme with the values of the dimension.
The Browser get the URIs of the dimension values from the observations and the labels from the Concept Scheme.
Some other approaches: Although the Irish CSO does not connect a qb:DimensionProperty with a
qb:codeList property, it gets the dimension values from a Concept Scheme. Other challenges:
How to encode the order of dimension values
26th May 2015
D5: Identify dimension values
Eurostat Workshop 36
The names of a qb:ComponentProperty can be Directly defined either as rdfs:label or skos:prefLabel Defined through a skos:Concept connected with the qb:concept to the
qb:ComponentProperty The Browser use both ways to get the names of the values. Other approaches
LATC’s Eurostat uses rdfs:label attached to the qb:ComponentPropoerty Open Data Communities in some cubes uses the first approach and in some
others use the second
26th May 2015
D6: Data available in multiple languages
Eurostat Workshop 37
According to the RDF Data Cube Voc: Use qb:HierarchicalCodeLists Use both SKOS and XKOS vocabularies:
skos defines the hierarchy (skos:broader, skos:inScheme, skos:member etc.) xkos defines the classification levels (xkos:ClassificationLevel, xkos:numberOfLevels,
xkos:depth etc)
The Browser assumes that The levels of the hierarchies used by the cube should be defined using the
xkos:ClassificationLevel concept. Each member of the level is defined as a skos:Concept object which is related
to the xkos:ClassificationLevel by the skos:member property
26th May 2015
D7: Define a hierarchy
Eurostat Workshop 38
Other practices LATC’s Eurostat does not define hierarchies although hierarchical data exist in
the same cube e.g. city(Riga), region (South West Wales), country (Greece), set of countries (EU28) (aei_ps_alt.ttl)
Irish Census does not define hierarchies. It has data about 12 geographical levels, it defines different Code Lists per geo level, and different cube per geo level.
Open Data Communities re-uses URIs from the Spatial Relations Ontology defined by Ordnance Survey and thus reuse the hierarchy.
The Flemish Government’s dataset defines hierarchies based on SKOS and XKOS
Other challenges xkos semantics of xkos:isPartOf and xkos:generalises seem wrong
26th May 2015
Challenges regarding hierarchies
Eurostat Workshop 39
Unit attach to the dataset Declare that the sdmx-attribute:unitMeasure can be attached
(qb:componentAttachment) to the qb:DataSet Use the sdmx-attribute:unitMeasure to the qb:Dataset to define the
measure’s unit In case multiple units are used for the same measure e.g. Kilos and
grams for the measure weight, then: Declare that the sdmx-attribute:unitMeasure can be attached
(qb:componentAttachment) to the qb:Observation Use the sdmx-attribute:unitMeasure at each observation to define the
measure’s unit The Browser does not support multiple units for the same measure
26th May 2015
D8: Identify the unit of cube’s single measure
Eurostat Workshop 40
Which measurement units to use; is there an easy to use universal standard?
26th May 2015
Challenges regarding unit of measure
Eurostat Workshop 41
Challenges: “Note that one limitation of the multi-measure approach is that it is not
possible to attach an attribute to a single observed value.” If one splits the observations and has one measure per observation then
again this seems contradicting to the vocabulary: “It is also possible to attach attributes to a qb:MeasureProperty in which case the attribute is intended to apply only to that property and not to the observations in which that property occurs.”
26th May 2015
D9: Identify the unit of cube’s multiple measures
Eurostat Workshop 42
The Browser assumes that the geospatioal dimension is the sdmx:refArea or a subproperty of the sdmx:refArea
26th May 2015
Identify the geospatial dimension of the cube
Eurostat Workshop 43
The time can be represented either as URI or as a literal. The Browser takes into account both, Approaches in existing datasets:
LATC’s Eurostat uses dcterms:date in the DSD and sdmx-dimension: timePeriod in the data. No URIs but literal.
The Irish Census data does not have a time dimension Challenges:
Which is better: atomic values or identifiers for e.g. year
26th May 2015
Identify the time dimension of the cube
Eurostat Workshop 4426th May 2015