dmsn 2011 cagri balkesen & nesime tatbul scalable data partitioning techniques for parallel...

20
DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Upload: raymundo-peart

Post on 28-Mar-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

DMSN 2011Cagri Balkesen & Nesime Tatbul

Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Page 2: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Talk Outline

• Intro & Motivation• Stream Partitioning Techniques

– Basic window partitioning– Batch partitioning– Pane-based partitioning

• Ring-based Query Evaluation• Experimental Evaluation• Conclusions & Future Work

[email protected] 2

Page 3: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

DSMS

Intro & Motivation

[email protected] 3

Page 4: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Architectural Overview

• Classical Split-Merge pattern from Parallel DBs• Adjustable parallelism level, d• QoS on max latency & order

Query

Query nodes

Splitstage

Split node

Query Mergestage

Merge node

inputstream

outputstream

QoS: latency < 5 seconds disorder < 3 tuples

Query

[email protected] 4

Page 5: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Related Work: How to Partition?

• Content-sensitive– FluX: Fault-tolerant, load balancing Exchange [1,2]– Use group-by values from the query to partition– Need explicit load-balancing due to skewed data

• Content-insensitive– GDSM: Window-based parallelization (fixed-size tumbling wins) [3]– Win-Distribute: Partition at window boundaries– Win-Split: Partition each win into equi-length subwins

• The Problem:– How to handle sliding windows?– How to handle queries without group-by or a few groups?

[1] Flux: An Adaptive Partitioning Operator for Continuous Query Systems, ICDE‘03[2] Highly-Available, Fault-Tolerant, Parallel Dataflows, SIGMOD ‘04[3] Customizable Parallel Execution of Scientific Stream Queries, VLDB ‘05

[email protected] 5

Page 6: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Stream Partitioning Techniques

Page 7: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

• Independently processable chunking– Window aware splitting of the stream

• Each window has an id & tuples are marked– (first-winid, last-winid, is-win-closer)

• Tuples are replicated for each of their windows

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . .

W1

W3

W4

w = 6 units, s = 2 units, Replication = 6/2 = 3

Node1

Node2

Node3

SplitW2

Approach 1: Basic Sliding Window Partitioning

[email protected] 7

Page 8: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

The Problem with Basic sliding window partitioning:• Tuples belong to many windows depending on slide• Excessive replication of tuples for each window• Increase in output data volume of split

Approach 1: Basic Sliding Window Partitioning

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . .

W1

W3

W4

Node1

Node2

Node3

SplitW2

w = 6 units, s = 2 units, Replication = 6/2 = 3

[email protected] 8

Page 9: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Approach 2: Batch-based Partitioning

• Batch several windows together to reduce replication• “Batch-window”: wb

= w+(B-1)*s ; sb = B*s– All the tuples in a batch go to the same partition– Only tuples overlapping btw. batches are replicated

• Replication reduced to wb/sb partitions instead of w/st1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . .

w1w2

w3

w4w5

w6

w7w8

B1

B2w = 3, s = 1B = 3 wb = 5, sb = 3Replication : 3 5/3

[email protected] 9

Definitions:w : window-sizes : slide-sizeB : batch-size

Page 10: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

The Panes Technique

• Divide overlapping windows into disjoint panes• Reduce cost by sub-aggregation and sharing• Each window has w/gcd(w,s) panes of size gcd(w,s)• Query is decomposed: pane-level (PLQ) & window-level (WLQ)

queries

w1

w2

w3

w4

w5

. . .

win

dow

s

p1 p2 p3 p4 p5 p6 p7 p8 . . .

panes

[1] No Pane, No Gain: Efficient Evaluation of Sliding Window Aggregates over Data Streams, SIGMOD Record ‘[email protected] 10

Page 11: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Approach 3: Pane-based Partitioning• Mark each tuple with pane-id + win-id

– Treat panes as tumbling window with wp = sp = gcd(w,s)

• Route tuples to a node based on pane-id• Nodes compute PLQ with pane tuples• Combine all PLQ results of a window to form WLQ

– Need for an organized topology of nodes– We propose organization of nodes in a ring

[email protected] 11

Node1

Node2

Node3

Split

w = 6 units, s = 2 units

Page 12: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Window1

Pane1 Pane3

65

Pane2

4321

Window2Pane5

109

Pane4

87

Pane3

65

Window3Pane6 Pane7

14131211

Pane5

109

Node1

Node3

Node2

Merge

…P9P8

P3P2P1

…P11P10

P5P4

. .

.

…P13

P12

P7

P6

R3

R11

R9

R5

R13

R7

W1

W2

W3

Input Source

Split

Ring-based Query Evaluation

• High amount of pipelined result sharing among nodes

• Organized communication topology

[email protected] 12

W = 6, S = 4 tuplesP = GCD(6,4) = 2 tuples

Page 13: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Assignment of Windows and Panes to Nodes

• All pane results only arrive from predecessors• Pane results sent to successor is only local panes

– Each node is assigned n consecutive windows– Min n st.

[email protected] 13

Definitions:ww : win-size in # of panessw : slide-size in # of panes

Page 14: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Flexible Result Merging

[1] Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. ACM TODS ‘04

[email protected] 14

Fully-ordered

FIFO

k-ordered: k-ordering constraint [1], certain disorder allowed

Defn: For any tuple s, s’ arrives at least k+1 tuples after s st. s’.A ≥ s.A

* k = 0

Page 15: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Experimental Evaluation

• Implementation of techniques in Borealis• Workload adapted from Linear Road Benchmark

– Slightly modified segment statistics queries– Basic aggregation functions with different window/slide

ratios

[email protected] 15

Page 16: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Scalability of Split Operator

• Pane-partitioning: cost & tput constant regardless of overlap ratio• Window & batch –partitioning: cost ↑ and tput↓ as overlap ↑• Excessive replication in window-partitioning is reduced by batching

[email protected] 16

window-size/slide ratio (window overlap)

Max

imum

inpu

t rat

e (t

uple

s/se

cond

)

Page 17: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Scalability of Partitioning Techniques

• Pane-based scales close to linear until split is saturated– per tuple cost is constant

• Window & batch based: exteremely high replication– Split is not saturated, but scales very slowly

[email protected] 17

* w/s = overlap ratio = 100

Page 18: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Summary & Conclusions

• Pane-partitioning is the choice of partitioning– Avoids tuple replication– Incurs less overhead in split and aggregate– Scales close to linear

[email protected] 18

1) Window-based 2) Batch-based 3) Pane-based

Page 19: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Ongoing & Future Work

• Generalization of the framework• Support for adaptivity during runtime• Extending complexity of query plans• Extending performance analysis & experiments

[email protected] 19

Page 20: DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Thank You!

[email protected] 20