dynamic plan migration for continuous query over data streams yali zhu, elke rundensteiner and...
Post on 20-Dec-2015
222 views
TRANSCRIPT
Dynamic Plan Migration for Continuous Query over Data
StreamsYali Zhu, Elke Rundensteiner and George Heineman
Database System Research GroupWorcester Polytechnic Institute
Massachusetts, USA
*Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”
SIGMOD 2004 2
Motivation
Continuous query over streamsStatistics unknown before startStatistics changing during execution
Stream rates, arrival pattern, distribution, etc
Need for dynamic adaptationPlan re-optimization
Change the shape of the query plan tree
SIGMOD 2004 3
Run-time Plan Re-Optimization
Step 1 - Decide when to optimizeStatistics Monitoring
Step 2 – Generate new query planQuery Optimization
Step 3 – Replace current plan by new planPlan Migration
SIGMOD 2004 4
Naïve Plan Migration Strategy
Migration Steps Pause execution of old plan Drain out all tuples inside old plan Replace old plan by new plan Resume execution of new plan
AB
BC
A B C
AB
BC
A B C
Problem: Works for stateless operators only
SIGMOD 2004 5
Stateful Operator in CQ Why stateful
Need non-blocking operators in CQ Operator needs to output partial results State data structure keep received tuples
AB
A B
b1b2b3b4b5
ax
State A State B
ax
ax b2ax b3
Key Observation: The purge of tuples in states relies on processing of new tuples.
Example: Symmetric NL join w/ window constraints
SIGMOD 2004 6
Naïve Migration Strategy Revisited
Steps(1) Pause execution of old plan(2) Drain out all tuples inside old plan(3) Replace old plan by new plan(4) Resume execution of new plan
AB
BC
A B C(2)
All tuples drained
(4)Processing
Resumed
(3) Old Replaced
By new
Deadlock Waiting Problem:
SIGMOD 2004 7
Problem Definition
Dynamic Plan Migration Input (two migration boxes)
One contains old plan One contains new plan Have same input and output queues
Result Old box is replaced by new box
Valid Migration No missing tuples No duplicates
BC
AB
QA QB QC QD
QABCD
AB
CD
BC
QA QB QC QD
QABCD
SAB SC
SA SBSB
SC
SBC SD
SBCDSACD
SABCSD
Key points:- Involved plans contain stateful operators- Need to migrate yet still retain useful states and discard useless states.
SIGMOD 2004 8
State of the Art
“Efficient mid-query re-optimization of sub-optimal query execution plans” [Kabra, DeWitt 1998] Only migrates unprocessed portion
Query plan competing model [Ioannidis, Ng, et. al. 1992] [Graefe, Cole. 1994] Generate several candidate query plans before start Execute all, choose one after a while
SIGMOD 2004 9
Outline
Problem Motivation and Definition Dynamic Migration Strategies
Moving State StrategyParallel Track Strategy
Experimental Results
SIGMOD 2004 10
Moving State Strategy
Basic idea Share common states
between two migration boxes
Key steps State Matching
Match states based on IDs. State Moving
Create new pointers for matched states in new box
What’s left? Unmatched states in new
box
CDSABC SD
BCSAB SC
ABSA SB
ABSA SBCD
CDSBC
SD
BCSB SC
QA QB QC QD QA QB QC QD
QABCD QABCD
Old Box New Box
SIGMOD 2004 11
Unmatched States
State Recomputing Recursively recompute unmatched
SBC and SBCD from bottom up
Why always possible? Old and new boxes have same input
queues The states associated with input
queues always match
Why necessary?
ABSA SBCD
CDSBC SD
BCSB SC
QA QB QC QD
QABCD
SIGMOD 2004 12
Terms on Tuples
New/Old tuples Old: tuples already in old box when migration starts New: tuples not exist in old box when migration starts
Sub-tuples Tuple ABCD is result of Tuple A, B, C and D are sub-tuples of tuple ABCD Tuple ABCD has 24=16 possible combinations of old/new sub-tuples
A B C D
CD
BC
AB
QA QB QC QD
SABC
SC
SA SB
SD
SAB
QABCD
SIGMOD 2004 13
Why Recompute Unmatched States
To get the complete results of ABCD, we need all 16 old/new combinations
AB
CD
BC
QB QC QDQA
SA
SD
SB SC
SBCD
SBC
If SBC not recomputed, will miss results with both B and C as OLD:
Old Tuple
New Tuple
B C DA
B C DA
B C DA
SIGMOD 2004 14
Cost Estimation of MS Migration
Cost of MS consists of Cost of state matching
ID comparison (neglectable) Cost of state moving
Create pointers (neglectable) Cost of state recomputing
Majority of cost
Affecting parameters Operator selectivities # of tuples in states
Estimated as (input rate x window size)
See paper for detailed cost models
One cost model conclusion:
Cost of MS has polynomial relation to window size
SIGMOD 2004 15
MS Migration Pros and Cons
ProsFast when # of tuples in states is small
Low input rates, low selectivity or small window
ConsOutput silence during entire migration stage
Can query output even during migration? Motivation for Parallel Track Strategy
SIGMOD 2004 16
Parallel Track Strategy
Basic idea Execute both plans
in parallel and gradually “push” old tuples out of old box by purging
Key steps Connect boxes Execute in parallel
Until old box “expired” (no old tuple or sub-tuple)
Disconnect old box Start execute new
box only
CD
SABC SD
BC
SAB SC
AB
SA SB
AB
SASBCD
CD
SBC SD
BCSB SC
QA QB QCQD
QA QB QC QD
QABCD QABCD
SIGMOD 2004 17
Potential Duplicates
Tuple ABCD 24=16 possible old/new sub-
tuple combinations Same case not generated by
both boxes Otherwise we may have
duplicates
In new box all states start empty only generates ABCD as
(new,new,new,new) In old box
may generate all 16 cases duplicate the case of
(new,new,new,new)
CD
BC
AB
QA QBQC
QD
SABC
SC
SA SB
SD
SAB
QABCDAt root op in old box:If both to-be-joined tuples have all-new
sub-tuples, don’t join.
Other op in old box:
Proceed as normal
Duplicate Prevention
SIGMOD 2004 18
Estimation of PT Migration
TPT ≈ 2W
1st W
2nd W
TM-start
TM-end
T
New New
OldOld
New New
Old Old
Estimation Formula:
CD
BC
AB
QA QB QC QD
SABC
SC
SA SB
SD
SAB
Old Box
W
SIGMOD 2004 19
PT Migration Duration Given enough system computing resources
new tuples processed right away PT migration duration ≈ 2W
If not enough system resources New tuples accumulated in queues PT migration duration > 2W
SIGMOD 2004 20
Cost Estimation of PT Migration
Cost of PT
= cost of process 2W tuples in old box+
cost of process 2W tuples in new box
Parameters: Input rates, window size, selectivity
Similar to MS strategy
SIGMOD 2004 21
PT Migrations Pros and Cons
ProsKeep on producing results even during
migration no results during MS migration
ConsMigration duration is at least 2W
MS may be faster depending on # tuples in states
SIGMOD 2004 22
Outline
Problem Definition and Motivation Dynamic Migration Strategies
Moving State StrategyParallel Track Strategy
Experimental Results
SIGMOD 2004 23
Experimental Setup Embed in the CAPE system
CAPE = Continuous Adaptive Processing Engine A streaming query engine developed at DSRG, WPI
VLDB’04 demo Layers of Adaptations
Punctuation exploring Adaptive scheduling Query migration Dynamic distribution
Input Streams By stream generator of CAPE Poisson arrival pattern
Experiments on migration duration Vary window size
CAPE Runtime Engine
Runtime Engine
OperatorConfigurator
QoS Inspector
OperatorScheduler
PlanMigrator
ExecutionEngineStorage
ManagerStream
Receiver
DistributionManager
Query PlanGenerator
Stream / QueryRegistration
GUI
StreamProvider
QueriesResults
CAPE Runtime Engine
Runtime Engine
OperatorConfigurator
QoS Inspector
OperatorScheduler
PlanMigrator
ExecutionEngineStorage
ManagerStream
Receiver
DistributionManager
Query PlanGenerator
Stream / QueryRegistration
GUI
StreamProvider
QueriesResults
SIGMOD 2004 24
Migration Duration vs. Window Size
0
2000
4000
6000
8000
10000
12000
14000
0 2000 4000 6000 8000Global Window Size W (ms)
Mig
rati
on
Du
rati
on
(m
s)
Measured T_PT Estimated T_PT
0200400600800
100012001400160018002000
0 2000 4000 6000 8000Global Window Size W (ms)
Mig
rati
on
Du
rati
on
(m
s)
Measured T_MS Poly. (Measured T_MS)
0
2000
4000
6000
8000
10000
12000
14000
0 1000 2000 3000 4000 5000Window Size (ms)
Mig
rati
on
Du
rati
on
T_MS T_PT
SIGMOD 2004 25
Conclusions
Identify problem of migration for stateful operators First solutions for continuous query migration
Moving state strategy Parallel track strategy
Embed both strategies into stream system Cost model and experimental evaluation
Cost model confirmed by experiments Identify performance trade-off of the two strategies