distributed data analytics thorsten papenbrock storage and … · 2017-11-08 · 0 tree; frequently...

54
Distributed Data Analytics Storage and Retrieval Thorsten Papenbrock G-3.1.09, Campus III Hasso Plattner Institut

Upload: others

Post on 24-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Distributed Data Analytics

Storage and Retrieval Thorsten Papenbrock

G-3.1.09, Campus III

Hasso Plattner Institut

Page 2: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Introduction

Layering Data Models

Slide 2

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

1. Conceptual layer

Data structures, objects, modules, …

Application code

2. Logical layer

Relational tables, JSON, XML, graphs, …

Database management system (DBMS) or storage engine

3. Representation layer

Bytes in memory, on disk, on network, …

Database management system (DBMS) or storage engine

4. Physical layer

Electrical currents, pulses of light, magnetic fields, …

Operating system and hardware drivers

our focus now

Page 3: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 3

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 4: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 4

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 5: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Index Structures

A Tiny Database

Slide 5

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Basic database tasks: (a) write given data, (b) read specific data

A tiny key-value store in two Bash functions:

It works:

$ db_set 1234 '{"name":"Berlin","type":"city"}'

$ db_get 1234

'{"name":"Berlin","type":"city"}'

#!/bin/bash

db_set () {

echo "$1,$2" >> database

}

db_get () {

grep "^$1," database | sed –e "s/^$1,//" | tail –n 1

}

Concatenate the first two parameters by “,” and write/append them to the

file named “database”

Find all lines starting with first parameter, remove first parameter from lines, and select the last line

Why?

Page 6: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Index Structures

A Tiny Database

Slide 6

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Assume the following input-sequence:

$ db_set 1234 '{"name":"Berlin","type":"city"}'

$ db_set 42 '{"name":"Germany","type":"country"}'

$ db_set 42 '{"name":"Germany","type":"country","capital":"Berlin"}'

The according “database”-file:

$ cat database

{"name":"Berlin","type":"city"}

{"name":"Germany","type":"country"}

{"name":"Germany","type":"country","capital":"Berlin"}

“database” is a Log file:

Append only, no removal of old values

only last entry for each key is valid

Fast writes ( O(1) ) but slow reads ( O(n) with n records in the log )

To speed-up reads: Indexes!

Page 7: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Index Structures

Indexes

Slide 7

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index

An additional data structure that locates data by some

search criterion, i.e., key (usually one or more attributes, or

value identifier)

Basically a key-value store, where values can be actual data

or pointers to relational records, documents, graph

nodes/edges, …

Improves data retrieval operations

Costs additional writes to index structure and storage space

Use indexes carefully (not too many)!

Different index implementations (data structures) have

different strengths

Choose the right index for your queries!

Page 8: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 8

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 9: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Hash Indexes

Hash Indexes

Slide 9

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Definition

A hash index is a hash map (dictionary) that maps keys to the positions

(in memory, on disk, on the network, …) of their values/records

The hash map uses a hash function to calculate mapping of keys and

positions and is usually kept in memory

Uses

key-value stores, other indexes, data distribution (load balancing,

sharding, …)

Strength

Point queries: one index look up directly delivers a value’s position

Weaknesses

Range queries require to look up each key individually

Hash map must fit into main memory; hash maps on disk perform poorly

Page 10: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Hash Indexes

Example

Slide 10

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

1 2 3 4 , { " n a m e " : " B e r l i n " , " t y p e " : "

c i t y " } \n 4 3 , { " n a m e " : " G e r m a n y " , " t

y p e " : " c o u n t r y " } \n

key byte offset

1234 0

42 37

0

30

60

10

40

70

20

50

80

37

In-memory hash map

Log-structured file on disk (each box is one byte)

hash function

Page 11: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Hash Indexes

Hash Indexes

Slide 11

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Controlling file growth

Indexed data, i.e., log is usually insert only (for good write performance)

Frequent updates make files unnecessarily large

Example: a store that maps products to stock-counts

Each purchase increments a stock-count new record!

Each sale decrements a stock-count new record!

But: collection of products is almost constant

Solution: consolidate/compact the log frequently freeing up disk space

How to do this on a running system?

Segmentation!

Page 12: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Hash Indexes

Hash Indexes

Slide 12

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Segmentation

Break log into segments of fixed size

Whenever a segment reaches max size:

1. Close the segment and redirect writes to a new segment

2. Compact the old segment:

1. Read the old segment backwards

2. If a key is read for the first time:

Write the entry (key + value) into a compacted segment

3. Merge the compacted segment with older compacted segment(s):

1. Read older compacted segment

2. If a key is not present in the newer compacted segment:

Write the entry (key + value) into the compacted segment

4. Delete the old segment and old compacted segment

Werwrth Wrthrth Wrthwrthwrth Wrthwrht Wrthwrh Wrhwthwht wrth

Rujrtj Wethwrth Reerzj Wrtwrjerjzezj Erzje Erzjezjetzj Etzetjz etzjejzezjejzetzj

Wrjezjejz Ejzez Etzje etjzetzjejzeWRE RTHWERTHE WRTHE ETZJETZJE ETZJ

EZJEZJ EJZETZJEJZ ZWEZ WETJHEJZU Q4WZ6ZR SR SRZR

ERZJWRZJE SFDGS SRTJHSW

RZJETZJE FGJRU RZIRZIK,R ARGA Q5ZW4E57JW6 SD

ATEHSWRZJ SRT SRTJHSRTJHSRT SSRZJ RZJERJZ WRJREJZREJ RJSER REJZWRJZWR REZJEJERZJ

RWZJWRZ RZJ RZJJREZ REZJWRZJW RZJ WZJ ERZJDGH

Q5THW SRTW WTRHWR 57J472 ETZJE 245Z36J357

E5J7E7JE EJ EZJEJE57JE WRJWRJ WJZWEJE5JE ETZJ WZJWWE6ZJEJ

Werwrth Wrthrth SATSFG Wrthwrht SRSSSRZNRZR Wrhwthwht wrthSRNS

SRT Etzjejzezjejzetzj SRTHS SRSRZJSWJZ ZTJEDJZ DTZJETZJETZJ

WrETZJ SRTHejz EjzeETZJETZJz Etzje etjzeWRE RTHERWRTHE ETZJETZJE ETZJ

EZJEZJ EJZETZJEJZ ZWEZ WETJHEJZU Q4TEZJWETZJZ6ZRTZJ SR SRETZJZR

ERZJWRZJE SFDGRTHS SERZJ EJZEZTJE ETZJETZJ ETZJETJZEJZETZ ETZJET ETZJRTJHSW

RZJETZJE FGJRU RZIRZIETZJK,R ARETZGA Q5JW6 SD ETZJETJ

ATEHSWRZJ SRTETZJETZJ TJHSRTJHSRT SSRZJETZJ RZJZ WRJREJ RJSER REJZWRJZWR REZJEJERZJ

RWZJWRZ RZJ RZJJREZ REZJWRZJW RZJTEZJETZJ WZJETZJ ERZJDGHETZJ ETZJEJ ETZJETZJ

Q5THW SRTW WTRHETZJWR 57J472ETJZ ETZJEETZ 245Z36J35 ETZJET7

E5J7E7JETZJE EJETZJ EZJEJE57JE WRJWRJ WJZWE ETZJETZJE5JE ETZJ WZJWWE6ZJEJ

Compacted

Current

ATEHS SRT SR

Writes

Reads

Page 13: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Hash Indexes

Hash Indexes

Slide 13

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Segmentation

Compact:

Merge:

Page 14: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 14

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 15: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

SSTables and LSM-Trees

SSTables

Slide 15

Thorsten Papenbrock

Definition

Sorted String Tables (SSTables) are segment files that contain key-value

pairs sorted by their key; each key only appears once

Sortation and distinctness of keys disallows simple appends to the log

Merge

Given two (or more) SSTables, their merge is calculated similarly to the

merge-sort algorithm:

1. Read all SSTables simultaneously:

1. Copy the smallest key with its value to the merged SSTable

and read next key-value pair

2. If keys are similar:

Copy only the key-value pair from the most recent SSTable to the merged SSTable

1 2 3 4 ?

2 4 5 7 1 3 6 7

Storage and Retrieval

Distributed Data Analytics

Page 16: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

SSTables and LSM-Trees

SSTables

Slide 16

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index

Two options:

a) A dense index, i.e., each key-value pair in the SSTable is indexed

Allows direct look ups

b) A sparse index, i.e., first key-value pair in each block of an SSTable

is indexed

If a query key is not directly indexed:

find the next larger key in the index (binary search)

find the block of the next larger key (look up)

search for the query key in the block (binary search or

linear scan, if log entries have variable size)

Blocks can be compressed to reduce memory consumption and

I/O bandwidth

Page 17: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

SSTables and LSM-Trees

SSTables

Slide 17

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Merge

Sparse index

Page 18: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

SSTables and LSM-Trees

LSM-Trees

Slide 18

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Definition

Log-Structured Merge-Trees (LSM Trees) are multilayered search trees for

key-value log-data that use different data structures, each of which

optimized for its underlying storage medium

Example

First layer (C0 Tree): index structure that …

1) efficiently takes new key-value pairs in any order

2) outputs all contained key-value pairs in sorted order

B-trees, red-black trees, AVL trees, …

Second layer (C1 Tree): index structure that …

1) is able to merge with sorted key-value pair lists

2) effectively compacts/compresses contained key-value pairs

SSTables

Patrick O'Neil et. al. “The log-structured merge-tree (LSM-tree)”, Acta Information, volume 33, number 4, pages 351-385, 1996

memtable

Page 19: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

SSTables and LSM-Trees

LSM-Trees

Slide 19

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Intuition

Sorted trees are fast in-memory indexes but they outgrow main memory

SSTables are indexable and compact but don’t support random inserts

Insert: add new key-value pairs to C0 Tree; frequently merge trees

down the hierarchy (C0 C1 C2 …) to free memory

Read: search for the key in every layer or chronological if only the

most recent value is needed

Data stores with

LSM-Tree implementations

LevelDB, RocksDB,

Cassandra, Hbase,

Lucene (in Elastic-

search and Solr), …

Page 20: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

SSTables and LSM-Trees

LSM-Trees

Slide 20

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Optimizations Querying non-existend values is expensive (check all layers)

Catch most of these queries with a Bloomfilter

Size-tiered compaction

Merge newer, smaller SSTables successively into older, larger SSTables

Page 21: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 21

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 22: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

B-Tree

Definition and Structure

Slide 22

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Definition

A self-balancing, tree-based data structure, that stores values sorted by key and

allows read/write access in logarithmic time

A generalization of a binary search tree as nodes can have more than two

children

Structure

Blocks:

Nodes in the tree that contain key-value pairs and pointers to other blocks

Correspond to physical, fixed sized disk blocks/pages that are addressed

and read as single units

Pointers:

Edges in the tree that connect blocks in tree structure

Correspond to physical block/page addresses

R. Bayer and E. McCreight. “Organization and maintenance of large ordered indices”, In Proceedings of SIGFIDET (now SIGMOD) Workshop, pages 107-141, 1970

Page 23: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

B-Tree

Constraints

Slide 23

Thorsten Papenbrock

Constraints

Balanced:

Same distance from root-node to all leaf-nodes

Depth of the tree is in O(log(n)) (= key-look-up complexity)

If tree grows unbalanced, re-balancing required

Block-Content:

A block contains n keys and n+1 pointers in alternating order

Pointers left to a key point to blocks containing smaller keys

Pointers right to a key point to blocks containing larger/equal keys

All values in the tree are sorted by their keys!

Block-Size:

Typically 4096 Byte per block; 4 Byte per key; 8 Byte per pointer

4n + 8(n+1) ≤ 4096 => n = 340

Page 24: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

B-Tree

Constraints

Slide 24

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Constraints

Root Node:

Points to underlying nodes (and values)

At least 2 pointers used

Inner Node:

Points to underlying nodes (and values)

At least (𝑛 + 1)/2 pointers used

Leaf Node:

Points to right neighbor leaf and values

At least 1 neighbor pointer (if present) and (𝑛 + 1)/2 value pointers used

Uses

Any data store: most widely used index structure for DBMSs

Sorted, dense, and sparse indexes

B-Tree vs. B+-Tree

B-Trees store keys and values in both internal and leaf nodes;

B+-Trees store values only in leaf nodes.

The following examples show B+-Trees

Page 25: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

B-Tree

Example

Slide 25

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Example

Block Key

Pointer free space Root

Inners

Leafs

Values

B-Tree vs. SSTable .

Always rewrite entire blocks vs. always append and merge frequently

Page 26: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

B-Tree

More Examples

Slide 26

Storage and Retrieval

Thorsten Papenbrock

Distributed Data Analytics

Neighbor pointer

Look up scenario

Grow scenario

With data

Page 27: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

B-Tree

Insert and Delete

No

Find leaf node of the key

Delete key and value-pointer

Underflow?

Redistribute

with neighbors?

Adjust the keys of

all parent nodes

Redistribute

Done No

Yes

N leaf?

Move all keys of the right neighbor

into the left neighbor node and

delete the (empty) right node

Move split-key

into parent node of

left neighbor node

Yes No Merge

Yes

No

Find corresponding leaf node for key

Insert key and value-pointer

Overflow?

N root?

Create new

root node

(with only one

Single key)

Split node N into nodes N1, N2

and distribute keys evenly

N leaf?

Insert new key-pointer

pair into parent node

Copy smallest

key in N2

upwards

Move largest

key in N1

upwards

Done

Yes

No

Yes No

Yes

Split

Split and Merge operations guarantee that the B-Tree is always balanced and the blocks are filled correctly.

Page 28: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 28

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 29: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Further Indexes

Alternative Index Types

Slide 29

Thorsten Papenbrock

Clustered Index

Stores indexed data or parts of it within the index (instead of pointers to data)

Example: An index on attribute delivery_status allows to count pending

deliveries without data access.

Improves the performance of certain queries

Reduces write performance and requires additional storage

Redundant values (in data and index) complicate data consistency

Multi-Column Index

a) Concatenated index: merge keys into one key

b) Multi-dimensional index: split multi-dim. key domain into multi-dim. shapes

Example: An index on two-dim. geo locations (longitude, latitude)

to answer intersection, containment, and nearest neighbor queries.

Most common implementation: R-Trees

Page 30: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Further Indexes

R-Tree

Slide 30

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

R-Tree

A variation of a B-Tree that uses a hierarchy of rectangles as keys

Also: balanced and block-sized nodes

Indexed points …

are clustered into leaf nodes

might occur in multiple clusters

Insertion:

into appropriate clusters

split cluster if too large

find smallest cluster extension

via heuristic if no cluster fits directly

Page 31: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Further Indexes

Alternative Index Types

Slide 31

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Fuzzy Index

Index on terms/keys that allows for value misspellings, synonyms, variations, …

Idea: sparse, sorted index (e.g. SSTable or B-Tree) with similarity look up

Example: An index on attribute firstname where names might be misspelled.

Look up most similar key

Scan the (sorted!) neighborhood of that key’s value for similar values

John Sno

John Snow

Jon Snow

john snow

jonny snow

Page 32: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 32

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 33: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Onlien Analytical Processing (OLAP)

What is Analytics?

Slide 33

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

“Calculate the number of sales per year and state.”

SELECT SUM(S.sales)

FROM Sales S, Times T, Locations L

WHERE S.timeid = T.timeid AND S.timeid = L.timeid

GROUP BY T.year, L.state

Aggregate Queries

Outlier Detection

Clustering

Trend Prediction

And many further questions

to data sets!

Page 34: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 34

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 35: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Data Warehousing

Data Warehouse

Slide 35

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Definition

A central repository of integrated, potentially pre-aggregated historical data

from one ore more distinct sources (usually operational/OLTP systems)

Ecosystem

Customer

Employee

Supplier DeliveryDB

InventoryDB

SalesDB

Staging Area

Operational Systems

Data Warehouse

raw data

meta data

summary data data integration, cleaning, pre-aggregation, …

ETL

ETL

ETL

ETL ETL

Data Marts

Purchasing

Sales Analyst

Reporting

data preparation, filtering, …

ETL-processes (extract, transform, load)

Page 36: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Data Warehousing

Assessment

Slide 36

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Advantages

Compute intense analytical queries do not interfere with operational business

Data is static and does not change while queries are run

Analytics-friendly data layouts, e.g., star-schemata, data cubes, or

materialized views as well as indexes for analytical query patters are possible

Data can be sorted by certain keys that are often queried as range or group

(e.g. timestamps, version-IDs, tags, or country codes), whereas operational

data is usually not sorted for better insert performance

Data can be compressed more aggressively, which can improve read

performance

Disadvantages

Data is not up-to-date (one ETL-cycle ≈ one day old)

Page 37: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Data Warehousing

Solutions

Slide 37

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Commercial

Microsoft SQL Server

SAP HANA

Teradata

Vertica

ParAccel

Open-Source

Apache Hive

Spark SQL

Cloudera Impala

Facebook Presto

Apache Tajo

Apache Drill

Most of these are Hadoop-based and use some kind of

MapReduce paradigm.

Page 38: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 38

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 39: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Star- and Snowflake Schemata

Definitions

Slide 39

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Star Schema

An acyclic graph of relational tables with depth one, where the root is

called fact table and the leafs dimension tables

Fact tables: contain events (transactions, measurements, snapshots,…)

Usually very large tables

Examples: sales, page views, clicks, shippings, sensor readings, …

Dimension tables: contain entity data and descriptive information

Usually small tables with fix domain

Examples: products, employees, customers, dates, locations, …

Answer for each event: who, what, where, when, how, or why

Snowflake Schema

Same as star schema, but with arbitrary depth, i.e., dimension tables

might have further dimension tables

Page 40: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Star- and Snowflake Schemata

Examples

Slide 40

Storage and Retrieval

Thorsten Papenbrock

Star Schema Snowflake Schema

Distributed Data Analytics

Page 41: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Star- and Snowflake Schemata

Benefits

Slide 41

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Star- and Snowflake Schemata

Improved query performance for (most) aggregation queries:

Better join-performance than normalized data

Better scan performance than one big table

May contain pre-aggregated data

Simpler queries:

Clear join logic and limited number of necessary joins

Page 42: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 42

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 43: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Column-Oriented Storage

Motivation and Idea

Slide 43

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Observation

Data warehouse tables are often very wide, but analytical queries access only

very few columns (usually 4 to 5 and rarely “SELECT *”)

Most data models (relational, key-value, column family, document) store data

record-wise and must read, parse and filter all data for analytical queries

Solution

Store data attribute-wise instead of record-wise:

One file per attribute

Values ordered by record index in each file

For each query: scan only required attribute files

Page 44: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Column-Oriented Storage

Example

date_key product_sk store_sk promotion_sk customer_sk quantity net_price discount_price

140102 69 4 NULL NULL 1 13.99 13.99

140102 69 5 19 NULL 3 14.99 9.99

140102 69 5 NULL 191 1 14.99 14.99

140102 74 3 23 202 5 0.99 0.89

140103 31 2 NULL NULL 1 2.49 2.49

140103 31 3 NULL NULL 3 14.99 9.99

140103 31 3 21 123 1 49.99 39.99

140103 31 8 NULL 233 1 0.99 0.99

fact_sales table

file 1 file 2 file 3 file 4 file 5 file 6 file 7 file 8

Page 45: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Column-Oriented Storage

Query

product_sk file

69 69 69 69 74 31 31 31 31 29 30 30 31 31 31 68 69 69

Query

SELECT product_sk, SUM(quantity) AS product_sales

FROM fact_sales

WHERE product_sk IN (30, 68, 69)

GROUP BY product_sk;

1. Scan product_sk file for values 30, 68, and 69 and

remember the row of each value

2. Read the quantities at the retrieved rows in the

quantity file and sum accordingly

Slide 45

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock We can do even better!

Compression

Page 46: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Column-Oriented Storage

Bitmap Encoding

product_sk file

69 69 69 69 74 31 31 31 31 29 30 30 31 31 31 68 69 69

Bitmap encoding

Slide 46

Thorsten Papenbrock

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 29

0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 30

0 0 0 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 31

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 68

1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 69

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 74

Smaller? … Here, yes:

18 · 32 Bit = 576 Bit vs. 6 · (32 Bit + 32 Bit) = 384 Bit

Page 47: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Column-Oriented Storage

Bitmap Encoding

Bitmap encoding

Slide 47

Thorsten Papenbrock

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 29

0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 30

0 0 0 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 31

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 68

1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 69

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 74

Bit-wise OR

SELECT product_sk, SUM(quantity) AS product_sales FROM fact_sales WHERE product_sk IN (30, 68, 69) GROUP BY product_sk;

Page 48: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Column-Oriented Storage

Run-length Encoding

Problem: Bitmaps are very long and sparse in practice

Run-length encoding

29 9 zeros, 1 one, rest zeros

30 10 zeros, 2 one, rest zeros

31 5 zeros, 4 one, 3 zeros, 3 ones, rest zeros

68 15 zeros, 1 one, rest zeros

69 0 zeros, 4 one, 12 zeros, 2 ones, rest zeros

74 4 zeros, 1 one, rest zeros

9 1

10 2

5 4 3 3

15 1

0 4 12 2

4 1

Slide 48

Thorsten Papenbrock

For more compression techniques see:

Daniel Abadi et. al. “The Design and Implementation of Modern Column-Oriented Database Systems”, 2013.

See also: Roaring Bitmaps http://roaringbitmap.org/

Page 49: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Overview

Storage and Retrieval

Slide 49

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Index Structures

Hash Indexes

SSTables and LSM-Trees

B-Trees

Further Indexes

Online Analytical Processing (OLAP)

Data Warehousing

Star- and Snowflake Schemata

Column-Oriented Storage

Data Cubes and Materialized Views

Page 50: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Data Cubes and Materialized Views

Pre-Aggregation

Slide 50

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Observation

Most analytical queries in data warehouses are aggregate queries

Aggregation patterns repeat frequently

Pre-calculate common aggregates!

Materialized Views

Are query results that are written back to disk

DBMS query optimizer can automatically use these views to answer

queries (or parts of queries)

Strategies to improve data analytics:

Pre-calculation: estimate common aggregates and pre-calculate them

Lazy pre-calculation: store each aggregate once it was queried

Page 51: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Data Cubes and Materialized Views

Data Cubes

Slide 51

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Definition

Materialized views for multi-dimensional aggregates, i.e.,

a grid of aggregates grouped by different dimensions

Example

32 33 34 35 … total

140101 149.60 31.01 84.58 28.18 … 40710.53

140102 132.18 19.78 82.91 10.96 … 73091.28

140103 196.75 0.00 12.52 64.67 … 54688.10

140104 178.36 9.98 88.75 56.16 … 95121.09

… … … … … … …

total 14967.09 5910.43 7328.85 6885.39 … 5365M

product_sk

date

_key

+ + + +

+

+

+

+

+

+

+

+

SELECT product_sk, date_sk, SUM(net_price) FROM fact_sales GROUP BY product_sk, date_sk

Page 52: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Data Cubes and Materialized Views

Slicing- and Dicing-Queries

Slide 52

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

32 33 34 35 … total

140101 149.60 31.01 84.58 28.18 … 40710.53

140102 132.18 19.78 82.91 10.96 … 73091.28

140103 196.75 0.00 12.52 64.67 … 54688.10

140104 178.36 9.98 88.75 56.16 … 95121.09

… … … … … … …

total 14967.09 5910.43 7328.85 6885.39 … 5365M

product_sk

date

_key

SELECT SUM(net_price) FROM fact_sales WHERE product_sk = 32 AND date_sk = 140101

SELECT SUM(net_price) FROM fact_sales WHERE date_sk = 140101

SELECT SUM(net_price) FROM fact_sales WHERE product_sk = 32 AND net_price > 100.00

Page 53: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Data Cubes and Materialized Views

Data Cubes of higher Dimension

Slide 53

Storage and Retrieval

Distributed Data Analytics

Thorsten Papenbrock

Page 54: Distributed Data Analytics Thorsten Papenbrock Storage and … · 2017-11-08 · 0 Tree; frequently merge trees down the hierarchy (C 0 C 1 C 2 …) to free memory Read: search for

Distributed Data Analytics

Introduction Thorsten Papenbrock

G-3.1.09, Campus III

Hasso Plattner Institut