dynamic plan migration for continuous query over data streams yali zhu, elke rundensteiner and...

26
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester Polytechnic Institute Massachusetts, USA *Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”

Post on 20-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Dynamic Plan Migration for Continuous Query over Data

StreamsYali Zhu, Elke Rundensteiner and George Heineman

Database System Research GroupWorcester Polytechnic Institute

Massachusetts, USA

*Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”

SIGMOD 2004 2

Motivation

Continuous query over streamsStatistics unknown before startStatistics changing during execution

Stream rates, arrival pattern, distribution, etc

Need for dynamic adaptationPlan re-optimization

Change the shape of the query plan tree

SIGMOD 2004 3

Run-time Plan Re-Optimization

Step 1 - Decide when to optimizeStatistics Monitoring

Step 2 – Generate new query planQuery Optimization

Step 3 – Replace current plan by new planPlan Migration

SIGMOD 2004 4

Naïve Plan Migration Strategy

Migration Steps Pause execution of old plan Drain out all tuples inside old plan Replace old plan by new plan Resume execution of new plan

AB

BC

A B C

AB

BC

A B C

Problem: Works for stateless operators only

SIGMOD 2004 5

Stateful Operator in CQ Why stateful

Need non-blocking operators in CQ Operator needs to output partial results State data structure keep received tuples

AB

A B

b1b2b3b4b5

ax

State A State B

ax

ax b2ax b3

Key Observation: The purge of tuples in states relies on processing of new tuples.

Example: Symmetric NL join w/ window constraints

SIGMOD 2004 6

Naïve Migration Strategy Revisited

Steps(1) Pause execution of old plan(2) Drain out all tuples inside old plan(3) Replace old plan by new plan(4) Resume execution of new plan

AB

BC

A B C(2)

All tuples drained

(4)Processing

Resumed

(3) Old Replaced

By new

Deadlock Waiting Problem:

SIGMOD 2004 7

Problem Definition

Dynamic Plan Migration Input (two migration boxes)

One contains old plan One contains new plan Have same input and output queues

Result Old box is replaced by new box

Valid Migration No missing tuples No duplicates

BC

AB

QA QB QC QD

QABCD

AB

CD

BC

QA QB QC QD

QABCD

SAB SC

SA SBSB

SC

SBC SD

SBCDSACD

SABCSD

Key points:- Involved plans contain stateful operators- Need to migrate yet still retain useful states and discard useless states.

SIGMOD 2004 8

State of the Art

“Efficient mid-query re-optimization of sub-optimal query execution plans” [Kabra, DeWitt 1998] Only migrates unprocessed portion

Query plan competing model [Ioannidis, Ng, et. al. 1992] [Graefe, Cole. 1994] Generate several candidate query plans before start Execute all, choose one after a while

SIGMOD 2004 9

Outline

Problem Motivation and Definition Dynamic Migration Strategies

Moving State StrategyParallel Track Strategy

Experimental Results

SIGMOD 2004 10

Moving State Strategy

Basic idea Share common states

between two migration boxes

Key steps State Matching

Match states based on IDs. State Moving

Create new pointers for matched states in new box

What’s left? Unmatched states in new

box

CDSABC SD

BCSAB SC

ABSA SB

ABSA SBCD

CDSBC

SD

BCSB SC

QA QB QC QD QA QB QC QD

QABCD QABCD

Old Box New Box

SIGMOD 2004 11

Unmatched States

State Recomputing Recursively recompute unmatched

SBC and SBCD from bottom up

Why always possible? Old and new boxes have same input

queues The states associated with input

queues always match

Why necessary?

ABSA SBCD

CDSBC SD

BCSB SC

QA QB QC QD

QABCD

SIGMOD 2004 12

Terms on Tuples

New/Old tuples Old: tuples already in old box when migration starts New: tuples not exist in old box when migration starts

Sub-tuples Tuple ABCD is result of Tuple A, B, C and D are sub-tuples of tuple ABCD Tuple ABCD has 24=16 possible combinations of old/new sub-tuples

A B C D

CD

BC

AB

QA QB QC QD

SABC

SC

SA SB

SD

SAB

QABCD

SIGMOD 2004 13

Why Recompute Unmatched States

To get the complete results of ABCD, we need all 16 old/new combinations

AB

CD

BC

QB QC QDQA

SA

SD

SB SC

SBCD

SBC

If SBC not recomputed, will miss results with both B and C as OLD:

Old Tuple

New Tuple

B C DA

B C DA

B C DA

SIGMOD 2004 14

Cost Estimation of MS Migration

Cost of MS consists of Cost of state matching

ID comparison (neglectable) Cost of state moving

Create pointers (neglectable) Cost of state recomputing

Majority of cost

Affecting parameters Operator selectivities # of tuples in states

Estimated as (input rate x window size)

See paper for detailed cost models

One cost model conclusion:

Cost of MS has polynomial relation to window size

SIGMOD 2004 15

MS Migration Pros and Cons

ProsFast when # of tuples in states is small

Low input rates, low selectivity or small window

ConsOutput silence during entire migration stage

Can query output even during migration? Motivation for Parallel Track Strategy

SIGMOD 2004 16

Parallel Track Strategy

Basic idea Execute both plans

in parallel and gradually “push” old tuples out of old box by purging

Key steps Connect boxes Execute in parallel

Until old box “expired” (no old tuple or sub-tuple)

Disconnect old box Start execute new

box only

CD

SABC SD

BC

SAB SC

AB

SA SB

AB

SASBCD

CD

SBC SD

BCSB SC

QA QB QCQD

QA QB QC QD

QABCD QABCD

SIGMOD 2004 17

Potential Duplicates

Tuple ABCD 24=16 possible old/new sub-

tuple combinations Same case not generated by

both boxes Otherwise we may have

duplicates

In new box all states start empty only generates ABCD as

(new,new,new,new) In old box

may generate all 16 cases duplicate the case of

(new,new,new,new)

CD

BC

AB

QA QBQC

QD

SABC

SC

SA SB

SD

SAB

QABCDAt root op in old box:If both to-be-joined tuples have all-new

sub-tuples, don’t join.

Other op in old box:

Proceed as normal

Duplicate Prevention

SIGMOD 2004 18

Estimation of PT Migration

TPT ≈ 2W

1st W

2nd W

TM-start

TM-end

T

New New

OldOld

New New

Old Old

Estimation Formula:

CD

BC

AB

QA QB QC QD

SABC

SC

SA SB

SD

SAB

Old Box

W

SIGMOD 2004 19

PT Migration Duration Given enough system computing resources

new tuples processed right away PT migration duration ≈ 2W

If not enough system resources New tuples accumulated in queues PT migration duration > 2W

SIGMOD 2004 20

Cost Estimation of PT Migration

Cost of PT

= cost of process 2W tuples in old box+

cost of process 2W tuples in new box

Parameters: Input rates, window size, selectivity

Similar to MS strategy

SIGMOD 2004 21

PT Migrations Pros and Cons

ProsKeep on producing results even during

migration no results during MS migration

ConsMigration duration is at least 2W

MS may be faster depending on # tuples in states

SIGMOD 2004 22

Outline

Problem Definition and Motivation Dynamic Migration Strategies

Moving State StrategyParallel Track Strategy

Experimental Results

SIGMOD 2004 23

Experimental Setup Embed in the CAPE system

CAPE = Continuous Adaptive Processing Engine A streaming query engine developed at DSRG, WPI

VLDB’04 demo Layers of Adaptations

Punctuation exploring Adaptive scheduling Query migration Dynamic distribution

Input Streams By stream generator of CAPE Poisson arrival pattern

Experiments on migration duration Vary window size

CAPE Runtime Engine

Runtime Engine

OperatorConfigurator

QoS Inspector

OperatorScheduler

PlanMigrator

ExecutionEngineStorage

ManagerStream

Receiver

DistributionManager

Query PlanGenerator

Stream / QueryRegistration

GUI

StreamProvider

QueriesResults

CAPE Runtime Engine

Runtime Engine

OperatorConfigurator

QoS Inspector

OperatorScheduler

PlanMigrator

ExecutionEngineStorage

ManagerStream

Receiver

DistributionManager

Query PlanGenerator

Stream / QueryRegistration

GUI

StreamProvider

QueriesResults

SIGMOD 2004 24

Migration Duration vs. Window Size

0

2000

4000

6000

8000

10000

12000

14000

0 2000 4000 6000 8000Global Window Size W (ms)

Mig

rati

on

Du

rati

on

(m

s)

Measured T_PT Estimated T_PT

0200400600800

100012001400160018002000

0 2000 4000 6000 8000Global Window Size W (ms)

Mig

rati

on

Du

rati

on

(m

s)

Measured T_MS Poly. (Measured T_MS)

0

2000

4000

6000

8000

10000

12000

14000

0 1000 2000 3000 4000 5000Window Size (ms)

Mig

rati

on

Du

rati

on

T_MS T_PT

SIGMOD 2004 25

Conclusions

Identify problem of migration for stateful operators First solutions for continuous query migration

Moving state strategy Parallel track strategy

Embed both strategies into stream system Cost model and experimental evaluation

Cost model confirmed by experiments Identify performance trade-off of the two strategies

SIGMOD 2004 26

Thank You

For more information, check the CAPE website @:

http://davis.wpi.edu/~dsrg/CAPE/