distributed data analytics thorsten papenbrock storage and … · 2017-11-08 · 0 tree; frequently...

Distributed Data Analytics

Storage and Retrieval Thorsten Papenbrock

G-3.1.09, Campus III

Hasso Plattner Institut

Introduction

Layering Data Models

Slide 2

Storage and Retrieval


Thorsten Papenbrock

1. Conceptual layer

Data structures, objects, modules, …

Application code

2. Logical layer

Relational tables, JSON, XML, graphs, …

Database management system (DBMS) or storage engine

3. Representation layer

Bytes in memory, on disk, on network, …

Database management system (DBMS) or storage engine

4. Physical layer

Electrical currents, pulses of light, magnetic fields, …

Operating system and hardware drivers

our focus now

Overview


Slide 3



Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Overview


Slide 4



Thorsten Papenbrock

Index Structures

Hash Indexes


B-Trees

Further Indexes


Data Warehousing




Index Structures

A Tiny Database

Slide 5



Thorsten Papenbrock

Basic database tasks: (a) write given data, (b) read specific data

A tiny key-value store in two Bash functions:

It works:

$ db_set 1234 '{"name":"Berlin","type":"city"}'

$ db_get 1234

'{"name":"Berlin","type":"city"}'

#!/bin/bash

db_set () {

echo "$1,$2" >> database

}

db_get () {

grep "^$1," database | sed –e "s/^$1,//" | tail –n 1

}

Concatenate the first two parameters by “,” and write/append them to the

file named “database”

Find all lines starting with first parameter, remove first parameter from lines, and select the last line

Why?

Index Structures

A Tiny Database

Slide 6



Thorsten Papenbrock

Assume the following input-sequence:

$ db_set 1234 '{"name":"Berlin","type":"city"}'

$ db_set 42 '{"name":"Germany","type":"country"}'

$ db_set 42 '{"name":"Germany","type":"country","capital":"Berlin"}'

The according “database”-file:

$ cat database

{"name":"Berlin","type":"city"}

{"name":"Germany","type":"country"}

{"name":"Germany","type":"country","capital":"Berlin"}

“database” is a Log file:

Append only, no removal of old values

only last entry for each key is valid

Fast writes ( O(1) ) but slow reads ( O(n) with n records in the log )

To speed-up reads: Indexes!

Index Structures

Indexes

Slide 7



Thorsten Papenbrock

Index

An additional data structure that locates data by some

search criterion, i.e., key (usually one or more attributes, or

value identifier)

Basically a key-value store, where values can be actual data

or pointers to relational records, documents, graph

nodes/edges, …

Improves data retrieval operations

Costs additional writes to index structure and storage space

Use indexes carefully (not too many)!

Different index implementations (data structures) have

different strengths

Choose the right index for your queries!

Overview


Slide 8



Thorsten Papenbrock

Index Structures

Hash Indexes


B-Trees

Further Indexes


Data Warehousing




Hash Indexes

Hash Indexes

Slide 9



Thorsten Papenbrock

Definition

A hash index is a hash map (dictionary) that maps keys to the positions

(in memory, on disk, on the network, …) of their values/records

The hash map uses a hash function to calculate mapping of keys and

positions and is usually kept in memory

Uses

key-value stores, other indexes, data distribution (load balancing,

sharding, …)

Strength

Point queries: one index look up directly delivers a value’s position

Weaknesses

Range queries require to look up each key individually

Hash map must fit into main memory; hash maps on disk perform poorly

Hash Indexes

Example

Slide 10



Thorsten Papenbrock

1 2 3 4 , { " n a m e " : " B e r l i n " , " t y p e " : "

c i t y " } \n 4 3 , { " n a m e " : " G e r m a n y " , " t

y p e " : " c o u n t r y " } \n

key byte offset

1234 0

42 37

0

30

60

10

40

70

20

50

80

37

In-memory hash map

Log-structured file on disk (each box is one byte)

hash function

Hash Indexes

Hash Indexes

Slide 11



Thorsten Papenbrock

Controlling file growth

Indexed data, i.e., log is usually insert only (for good write performance)

Frequent updates make files unnecessarily large

Example: a store that maps products to stock-counts

Each purchase increments a stock-count new record!

Each sale decrements a stock-count new record!

But: collection of products is almost constant

Solution: consolidate/compact the log frequently freeing up disk space

How to do this on a running system?

Segmentation!

Hash Indexes

Hash Indexes

Slide 12



Thorsten Papenbrock

Segmentation

Break log into segments of fixed size

Whenever a segment reaches max size:

1. Close the segment and redirect writes to a new segment

2. Compact the old segment:

1. Read the old segment backwards

2. If a key is read for the first time:

Write the entry (key + value) into a compacted segment

3. Merge the compacted segment with older compacted segment(s):

1. Read older compacted segment

2. If a key is not present in the newer compacted segment:

Write the entry (key + value) into the compacted segment

4. Delete the old segment and old compacted segment

Werwrth Wrthrth Wrthwrthwrth Wrthwrht Wrthwrh Wrhwthwht wrth

Rujrtj Wethwrth Reerzj Wrtwrjerjzezj Erzje Erzjezjetzj Etzetjz etzjejzezjejzetzj

Wrjezjejz Ejzez Etzje etjzetzjejzeWRE RTHWERTHE WRTHE ETZJETZJE ETZJ

EZJEZJ EJZETZJEJZ ZWEZ WETJHEJZU Q4WZ6ZR SR SRZR

ERZJWRZJE SFDGS SRTJHSW

RZJETZJE FGJRU RZIRZIK,R ARGA Q5ZW4E57JW6 SD

ATEHSWRZJ SRT SRTJHSRTJHSRT SSRZJ RZJERJZ WRJREJZREJ RJSER REJZWRJZWR REZJEJERZJ

RWZJWRZ RZJ RZJJREZ REZJWRZJW RZJ WZJ ERZJDGH

Q5THW SRTW WTRHWR 57J472 ETZJE 245Z36J357

E5J7E7JE EJ EZJEJE57JE WRJWRJ WJZWEJE5JE ETZJ WZJWWE6ZJEJ

Werwrth Wrthrth SATSFG Wrthwrht SRSSSRZNRZR Wrhwthwht wrthSRNS

SRT Etzjejzezjejzetzj SRTHS SRSRZJSWJZ ZTJEDJZ DTZJETZJETZJ

WrETZJ SRTHejz EjzeETZJETZJz Etzje etjzeWRE RTHERWRTHE ETZJETZJE ETZJ

EZJEZJ EJZETZJEJZ ZWEZ WETJHEJZU Q4TEZJWETZJZ6ZRTZJ SR SRETZJZR

ERZJWRZJE SFDGRTHS SERZJ EJZEZTJE ETZJETZJ ETZJETJZEJZETZ ETZJET ETZJRTJHSW

RZJETZJE FGJRU RZIRZIETZJK,R ARETZGA Q5JW6 SD ETZJETJ

ATEHSWRZJ SRTETZJETZJ TJHSRTJHSRT SSRZJETZJ RZJZ WRJREJ RJSER REJZWRJZWR REZJEJERZJ

RWZJWRZ RZJ RZJJREZ REZJWRZJW RZJTEZJETZJ WZJETZJ ERZJDGHETZJ ETZJEJ ETZJETZJ

Q5THW SRTW WTRHETZJWR 57J472ETJZ ETZJEETZ 245Z36J35 ETZJET7

E5J7E7JETZJE EJETZJ EZJEJE57JE WRJWRJ WJZWE ETZJETZJE5JE ETZJ WZJWWE6ZJEJ

Compacted

Current

ATEHS SRT SR

Writes

Reads

Hash Indexes

Hash Indexes

Slide 13



Thorsten Papenbrock

Segmentation

Compact:

Merge:

Overview


Slide 14



Thorsten Papenbrock

Index Structures

Hash Indexes


B-Trees

Further Indexes


Data Warehousing





SSTables

Slide 15

Thorsten Papenbrock

Definition

Sorted String Tables (SSTables) are segment files that contain key-value

pairs sorted by their key; each key only appears once

Sortation and distinctness of keys disallows simple appends to the log

Merge

Given two (or more) SSTables, their merge is calculated similarly to the

merge-sort algorithm:

1. Read all SSTables simultaneously:

1. Copy the smallest key with its value to the merged SSTable

and read next key-value pair

2. If keys are similar:

Copy only the key-value pair from the most recent SSTable to the merged SSTable

1 2 3 4 ?

2 4 5 7 1 3 6 7




SSTables

Slide 16



Thorsten Papenbrock

Index

Two options:

a) A dense index, i.e., each key-value pair in the SSTable is indexed

Allows direct look ups

b) A sparse index, i.e., first key-value pair in each block of an SSTable

is indexed

If a query key is not directly indexed:

find the next larger key in the index (binary search)

find the block of the next larger key (look up)

search for the query key in the block (binary search or

linear scan, if log entries have variable size)

Blocks can be compressed to reduce memory consumption and

I/O bandwidth


SSTables

Slide 17



Thorsten Papenbrock

Merge

Sparse index


LSM-Trees

Slide 18



Thorsten Papenbrock

Definition

Log-Structured Merge-Trees (LSM Trees) are multilayered search trees for

key-value log-data that use different data structures, each of which

optimized for its underlying storage medium

Example

First layer (C0 Tree): index structure that …

1) efficiently takes new key-value pairs in any order

2) outputs all contained key-value pairs in sorted order

B-trees, red-black trees, AVL trees, …

Second layer (C1 Tree): index structure that …

1) is able to merge with sorted key-value pair lists

2) effectively compacts/compresses contained key-value pairs

SSTables

Patrick O'Neil et. al. “The log-structured merge-tree (LSM-tree)”, Acta Information, volume 33, number 4, pages 351-385, 1996

memtable


LSM-Trees

Slide 19



Thorsten Papenbrock

Intuition

Sorted trees are fast in-memory indexes but they outgrow main memory

SSTables are indexable and compact but don’t support random inserts

Insert: add new key-value pairs to C0 Tree; frequently merge trees

down the hierarchy (C0 C1 C2 …) to free memory

Read: search for the key in every layer or chronological if only the

most recent value is needed

Data stores with

LSM-Tree implementations

LevelDB, RocksDB,

Cassandra, Hbase,

Lucene (in Elastic-

search and Solr), …


LSM-Trees

Slide 20



Thorsten Papenbrock

Optimizations Querying non-existend values is expensive (check all layers)

Catch most of these queries with a Bloomfilter

Size-tiered compaction

Merge newer, smaller SSTables successively into older, larger SSTables

Overview


Slide 21



Thorsten Papenbrock

Index Structures

Hash Indexes


B-Trees

Further Indexes


Data Warehousing




B-Tree

Definition and Structure

Slide 22



Thorsten Papenbrock

Definition

A self-balancing, tree-based data structure, that stores values sorted by key and

allows read/write access in logarithmic time

A generalization of a binary search tree as nodes can have more than two

children

Structure

Blocks:

Nodes in the tree that contain key-value pairs and pointers to other blocks

Correspond to physical, fixed sized disk blocks/pages that are addressed

and read as single units

Pointers:

Edges in the tree that connect blocks in tree structure

Correspond to physical block/page addresses

R. Bayer and E. McCreight. “Organization and maintenance of large ordered indices”, In Proceedings of SIGFIDET (now SIGMOD) Workshop, pages 107-141, 1970

B-Tree

Constraints

Slide 23

Thorsten Papenbrock

Constraints

Balanced:

Same distance from root-node to all leaf-nodes

Depth of the tree is in O(log(n)) (= key-look-up complexity)

If tree grows unbalanced, re-balancing required

Block-Content:

A block contains n keys and n+1 pointers in alternating order

Pointers left to a key point to blocks containing smaller keys

Pointers right to a key point to blocks containing larger/equal keys

All values in the tree are sorted by their keys!

Block-Size:

Typically 4096 Byte per block; 4 Byte per key; 8 Byte per pointer

4n + 8(n+1) ≤ 4096 => n = 340

B-Tree

Constraints

Slide 24



Thorsten Papenbrock

Constraints

Root Node:

Points to underlying nodes (and values)

At least 2 pointers used

Inner Node:

Points to underlying nodes (and values)

At least (𝑛 + 1)/2 pointers used

Leaf Node:

Points to right neighbor leaf and values

At least 1 neighbor pointer (if present) and (𝑛 + 1)/2 value pointers used

Uses

Any data store: most widely used index structure for DBMSs

Sorted, dense, and sparse indexes

B-Tree vs. B+-Tree

B-Trees store keys and values in both internal and leaf nodes;

B+-Trees store values only in leaf nodes.

The following examples show B+-Trees

B-Tree

Example

Slide 25



Thorsten Papenbrock

Example

Block Key

Pointer free space Root

Inners

Leafs

Values

B-Tree vs. SSTable .

Always rewrite entire blocks vs. always append and merge frequently

B-Tree

More Examples

Slide 26


Thorsten Papenbrock


Neighbor pointer

Look up scenario

Grow scenario

With data

B-Tree

Insert and Delete

No

Find leaf node of the key

Delete key and value-pointer

Underflow?

Redistribute

with neighbors?

Adjust the keys of

all parent nodes

Redistribute

Done No

Yes

N leaf?

Move all keys of the right neighbor

into the left neighbor node and

delete the (empty) right node

Move split-key

into parent node of

left neighbor node

Yes No Merge

Yes

No

Find corresponding leaf node for key

Insert key and value-pointer

Overflow?

N root?

Create new

root node

(with only one

Single key)

Split node N into nodes N1, N2

and distribute keys evenly

N leaf?

Insert new key-pointer

pair into parent node

Copy smallest

key in N2

upwards

Move largest

key in N1

upwards

Done

Yes

No

Yes No

Yes

Split

Split and Merge operations guarantee that the B-Tree is always balanced and the blocks are filled correctly.

Overview


Slide 28



Thorsten Papenbrock

Index Structures

Hash Indexes


B-Trees

Further Indexes


Data Warehousing




Further Indexes

Alternative Index Types

Slide 29

Thorsten Papenbrock

Clustered Index

Stores indexed data or parts of it within the index (instead of pointers to data)

Example: An index on attribute delivery_status allows to count pending

deliveries without data access.

Improves the performance of certain queries

Reduces write performance and requires additional storage

Redundant values (in data and index) complicate data consistency

Multi-Column Index

a) Concatenated index: merge keys into one key

b) Multi-dimensional index: split multi-dim. key domain into multi-dim. shapes

Example: An index on two-dim. geo locations (longitude, latitude)

to answer intersection, containment, and nearest neighbor queries.

Most common implementation: R-Trees

Further Indexes

R-Tree

Slide 30



Thorsten Papenbrock

R-Tree

A variation of a B-Tree that uses a hierarchy of rectangles as keys

Also: balanced and block-sized nodes

Indexed points …

are clustered into leaf nodes

might occur in multiple clusters

Insertion:

into appropriate clusters

split cluster if too large

find smallest cluster extension

via heuristic if no cluster fits directly

Further Indexes

Alternative Index Types

Slide 31



Thorsten Papenbrock

Fuzzy Index

Index on terms/keys that allows for value misspellings, synonyms, variations, …

Idea: sparse, sorted index (e.g. SSTable or B-Tree) with similarity look up

Example: An index on attribute firstname where names might be misspelled.

Look up most similar key

Scan the (sorted!) neighborhood of that key’s value for similar values

John Sno

John Snow

Jon Snow

john snow

jonny snow

…

…

Overview


Slide 32



Thorsten Papenbrock

Index Structures

Hash Indexes


B-Trees

Further Indexes


Data Warehousing




Onlien Analytical Processing (OLAP)

What is Analytics?

Slide 33



Thorsten Papenbrock

“Calculate the number of sales per year and state.”

SELECT SUM(S.sales)

FROM Sales S, Times T, Locations L

WHERE S.timeid = T.timeid AND S.timeid = L.timeid

GROUP BY T.year, L.state

Aggregate Queries

Outlier Detection

Clustering

Trend Prediction

And many further questions

to data sets!

Overview


Slide 34



Thorsten Papenbrock

Index Structures

Hash Indexes


B-Trees

Further Indexes


Data Warehousing




Data Warehousing

Data Warehouse

Slide 35



Thorsten Papenbrock

Definition

A central repository of integrated, potentially pre-aggregated historical data

from one ore more distinct sources (usually operational/OLTP systems)

Ecosystem

Customer

Employee

Supplier DeliveryDB

InventoryDB

SalesDB

Staging Area

Operational Systems

Data Warehouse

raw data

meta data

summary data data integration, cleaning, pre-aggregation, …

ETL

ETL

ETL

ETL ETL

Data Marts

Purchasing

Sales Analyst

Reporting

data preparation, filtering, …

ETL-processes (extract, transform, load)

Data Warehousing

Assessment

Slide 36



Thorsten Papenbrock

Advantages

Compute intense analytical queries do not interfere with operational business

Data is static and does not change while queries are run

Analytics-friendly data layouts, e.g., star-schemata, data cubes, or

materialized views as well as indexes for analytical query patters are possible

Data can be sorted by certain keys that are often queried as range or group

(e.g. timestamps, version-IDs, tags, or country codes), whereas operational

data is usually not sorted for better insert performance

Data can be compressed more aggressively, which can improve read

performance

Disadvantages

Data is not up-to-date (one ETL-cycle ≈ one day old)

Data Warehousing

Solutions

Slide 37



Thorsten Papenbrock

Commercial

Microsoft SQL Server

SAP HANA

Teradata

Vertica

ParAccel

…

Open-Source

Apache Hive

Spark SQL

Cloudera Impala

Facebook Presto

Apache Tajo

Apache Drill

…

Most of these are Hadoop-based and use some kind of

MapReduce paradigm.

Overview


Slide 38



Thorsten Papenbrock

Index Structures

Hash Indexes


B-Trees

Further Indexes


Data Warehousing





Definitions

Slide 39



Thorsten Papenbrock

Star Schema

An acyclic graph of relational tables with depth one, where the root is

called fact table and the leafs dimension tables

Fact tables: contain events (transactions, measurements, snapshots,…)

Usually very large tables

Examples: sales, page views, clicks, shippings, sensor readings, …

Dimension tables: contain entity data and descriptive information

Usually small tables with fix domain

Examples: products, employees, customers, dates, locations, …

Answer for each event: who, what, where, when, how, or why

Snowflake Schema

Same as star schema, but with arbitrary depth, i.e., dimension tables

might have further dimension tables


Examples

Slide 40


Thorsten Papenbrock

Star Schema Snowflake Schema



Benefits

Slide 41



Thorsten Papenbrock


Improved query performance for (most) aggregation queries:

Better join-performance than normalized data

Better scan performance than one big table

May contain pre-aggregated data

Simpler queries:

Clear join logic and limited number of necessary joins

Overview


Slide 42



Thorsten Papenbrock

Index Structures

Hash Indexes


B-Trees

Further Indexes


Data Warehousing





Motivation and Idea

Slide 43



Thorsten Papenbrock

Observation

Data warehouse tables are often very wide, but analytical queries access only

very few columns (usually 4 to 5 and rarely “SELECT *”)

Most data models (relational, key-value, column family, document) store data

record-wise and must read, parse and filter all data for analytical queries

Solution

Store data attribute-wise instead of record-wise:

One file per attribute

Values ordered by record index in each file

For each query: scan only required attribute files


Example

date_key product_sk store_sk promotion_sk customer_sk quantity net_price discount_price

140102 69 4 NULL NULL 1 13.99 13.99

140102 69 5 19 NULL 3 14.99 9.99

140102 69 5 NULL 191 1 14.99 14.99

140102 74 3 23 202 5 0.99 0.89

140103 31 2 NULL NULL 1 2.49 2.49

140103 31 3 NULL NULL 3 14.99 9.99

140103 31 3 21 123 1 49.99 39.99

140103 31 8 NULL 233 1 0.99 0.99

fact_sales table

file 1 file 2 file 3 file 4 file 5 file 6 file 7 file 8


Query

product_sk file

69 69 69 69 74 31 31 31 31 29 30 30 31 31 31 68 69 69

Query

SELECT product_sk, SUM(quantity) AS product_sales

FROM fact_sales

WHERE product_sk IN (30, 68, 69)

GROUP BY product_sk;

1. Scan product_sk file for values 30, 68, and 69 and

remember the row of each value

2. Read the quantities at the retrieved rows in the

quantity file and sum accordingly

Slide 45



Thorsten Papenbrock We can do even better!

Compression


Bitmap Encoding

product_sk file

69 69 69 69 74 31 31 31 31 29 30 30 31 31 31 68 69 69

Bitmap encoding

Slide 46

Thorsten Papenbrock

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 29

0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 30

0 0 0 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 31

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 68

1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 69

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 74

Smaller? … Here, yes:

18 · 32 Bit = 576 Bit vs. 6 · (32 Bit + 32 Bit) = 384 Bit


Bitmap Encoding

Bitmap encoding

Slide 47

Thorsten Papenbrock

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 29

0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 30

0 0 0 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 31

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 68

1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 69

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 74

Bit-wise OR

SELECT product_sk, SUM(quantity) AS product_sales FROM fact_sales WHERE product_sk IN (30, 68, 69) GROUP BY product_sk;


Run-length Encoding

Problem: Bitmaps are very long and sparse in practice

Run-length encoding

29 9 zeros, 1 one, rest zeros


31 5 zeros, 4 one, 3 zeros, 3 ones, rest zeros


69 0 zeros, 4 one, 12 zeros, 2 ones, rest zeros


9 1

10 2

5 4 3 3

15 1

0 4 12 2

4 1

Slide 48

Thorsten Papenbrock

For more compression techniques see:

Daniel Abadi et. al. “The Design and Implementation of Modern Column-Oriented Database Systems”, 2013.

See also: Roaring Bitmaps http://roaringbitmap.org/

Overview


Slide 49



Thorsten Papenbrock

Index Structures

Hash Indexes


B-Trees

Further Indexes


Data Warehousing





Pre-Aggregation

Slide 50



Thorsten Papenbrock

Observation

Most analytical queries in data warehouses are aggregate queries

Aggregation patterns repeat frequently

Pre-calculate common aggregates!

Materialized Views

Are query results that are written back to disk

DBMS query optimizer can automatically use these views to answer

queries (or parts of queries)

Strategies to improve data analytics:

Pre-calculation: estimate common aggregates and pre-calculate them

Lazy pre-calculation: store each aggregate once it was queried


Data Cubes

Slide 51



Thorsten Papenbrock

Definition

Materialized views for multi-dimensional aggregates, i.e.,

a grid of aggregates grouped by different dimensions

Example

32 33 34 35 … total

140101 149.60 31.01 84.58 28.18 … 40710.53

140102 132.18 19.78 82.91 10.96 … 73091.28

140103 196.75 0.00 12.52 64.67 … 54688.10

140104 178.36 9.98 88.75 56.16 … 95121.09

… … … … … … …

total 14967.09 5910.43 7328.85 6885.39 … 5365M

product_sk

date

_key

+ + + +

+

+

+

+

+

+

+

+

SELECT product_sk, date_sk, SUM(net_price) FROM fact_sales GROUP BY product_sk, date_sk


Slicing- and Dicing-Queries

Slide 52



Thorsten Papenbrock

32 33 34 35 … total

140101 149.60 31.01 84.58 28.18 … 40710.53

140102 132.18 19.78 82.91 10.96 … 73091.28

140103 196.75 0.00 12.52 64.67 … 54688.10

140104 178.36 9.98 88.75 56.16 … 95121.09

… … … … … … …

total 14967.09 5910.43 7328.85 6885.39 … 5365M

product_sk

date

_key

SELECT SUM(net_price) FROM fact_sales WHERE product_sk = 32 AND date_sk = 140101

SELECT SUM(net_price) FROM fact_sales WHERE date_sk = 140101

SELECT SUM(net_price) FROM fact_sales WHERE product_sk = 32 AND net_price > 100.00


Data Cubes of higher Dimension

Slide 53



Thorsten Papenbrock


Introduction Thorsten Papenbrock

G-3.1.09, Campus III

Hasso Plattner Institut

distributed data analytics thorsten papenbrock storage and … · 2017-11-08 · 0 tree; frequently...

Documents