adaptive query processing in the looking glass

40
Introduction AQP Families Comparison New Ideas Conclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison)

Upload: tatum-andrews

Post on 31-Dec-2015

33 views

Category:

Documents


1 download

DESCRIPTION

Adaptive Query Processing in the Looking Glass. Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison). Adaptive Query Processing (AQP) Systems: Publication Timeline. STREAM. Tukwila. POP. Re-Opt. River. Eddies. Query Scrambling. CAPE. NiagaraCQ. Parametric opt. - PowerPoint PPT Presentation

TRANSCRIPT

Introduction AQP Families Comparison New Ideas Conclusions

Adaptive Query Processing in the Looking Glass

Shivnath Babu (Stanford Univ.)Pedro Bizarro (Univ. of Wisconsin,

Madison)

Introduction AQP Families Comparison New Ideas Conclusions

Adaptive Query Processing (AQP) Systems:

Publication Timeline

…1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Parametric opt.

RedBrick

DEC-Rdb

Query Scrambling

Re-Opt

Tukwila

River

DQE

Conquest

Expected cost opt.

Pipeline sch.

Memory adap.

POP

CAPE

Correctiveprocessing

EddiesNiagaraCQ

STREAM

Ingres

Introduction

Introduction AQP Families Comparison New Ideas Conclusions

Motivation• Plenty of recent work on Adaptive Query Processing

(AQP) in different contexts– Conventional DBMS query processing, data

integration, continuous queries in stream systems• No exhaustive, in-depth categorization and comparison

of AQP systems to date• Difficult to answer questions like:

– Will techniques from one system work on another?– What are the shortcomings of each system?– Which system is best for a new application domain?

Introduction

Introduction AQP Families Comparison New Ideas Conclusions

Our Contributions

• Detailed study of current AQP systems • Classification of AQP systems into 3 families• Comparison across families in terms of AQP tasks• Identification of shortcomings & new approaches

to address them

Introduction

Introduction AQP Families Comparison New Ideas Conclusions

Roadmap

• Introduction to AQP• The three AQP system families• Comparison across families in terms of AQP tasks• Summary of what we learned

Introduction AQP Families Comparison New Ideas Conclusions

Primer on Traditional Query Processing

Optimizer:Chooses best plan

Query

Catalog(table sizes,histograms)

Uses stats to cost plans

Executor:Runs chosen plan

Chosen plan

Introduction

Statistics Tracker:Creates/updates stats

Runstats

Introduction AQP Families Comparison New Ideas Conclusions

Need for Adaptive Query Processing

Introduction

Correlated & skewed datadistributions

Errors in statsestimates,

optimizer mistakes

Detect plansuboptimality,

re-optimize

Stats & systemconditions maychange while

query is running

Monitor forchanges,

re-optimize

Continuousqueries,

long-runningqueries

AQP is integral to the current CS-wide push towardsautonomic computing

Introduction AQP Families Comparison New Ideas Conclusions

Our Focus: AQP for a Single Query

Introduction

• AQP System:– A system that interleaves the optimization and

execution aspects of query processing, possibly multiple times, during the processing of a single query

Introduction AQP Families Comparison New Ideas Conclusions

Roadmap

• Introduction to AQP• The three AQP system families• Comparison across families in terms of AQP tasks• Summary of what we learned

Introduction AQP Families Comparison New Ideas Conclusions

AQP System Families

• Plan-based AQP systems– AQP for traditional plan-based DBMSs

• Continuous-Query-based (CQ-based) AQP systems– AQP for long-running continuous queries over data

streams• Routing-based AQP systems– AQP for DBMSs and continuous queries based on

adaptive tuple routing

AQP Families

Introduction AQP Families Comparison New Ideas Conclusions

AQP in Plan-based Systems

Optimizer:Chooses best plan

Query

Catalog(table sizes,histograms)

Uses stats to cost plans

Executor:Runs chosen plan

Chosen plan

Statistics Tracker:Creates/updates stats

Runstats

+Extra

operators

Collectedstats

AQP Families

Introduction AQP Families Comparison New Ideas Conclusions

AQP in Plan-based Systems

Optimizer:Chooses best plan

Query

Catalog(Original +

observed stats)

Uses stats to cost plans

Executor:Runs chosen plan

Chosen plan

Statistics Tracker:Creates/updates stats

Runstats

+Extra

operators

Collectedstats

AQP Families

Re-optimize

Introduction AQP Families Comparison New Ideas Conclusions

Example Plan-based AQP Systems

…1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Parametric opt.

RedBrick

DEC-Rdb

Query Scrambling

Re-Opt

Tukwila

River

DQE

Conquest

Expected cost opt.

Pipeline sch.

Memory adap.

POP

CAPE

Correctiveprocessing

EddiesNiagaraCQ

STREAM

Ingres

AQP Families

Introduction AQP Families Comparison New Ideas Conclusions

Primer on Continuous Query Processing

• Continuous Queries (CQs) are long-running queries usually over data streams– Example CQ: Filtering packet

streams• Stream properties or system

conditions may change while query is running best plan may change σ

1

σ2

σ3

Packets

Chosen packets

AQP Families

Introduction AQP Families Comparison New Ideas Conclusions

AQP in CQ-based Systems

Optimizer:Chooses best plan

Query

Executor:Runs chosen plan

Chosen plan

AQP Families

Catalog(table sizes,histograms)

Statistics Tracker:Creates/updates stats

Runstats

Uses stats to cost plans

Introduction AQP Families Comparison New Ideas Conclusions

AQP in CQ-based Systems

Optimizer:Chooses best plan

Continuous Query

Executor:Runs chosen plan

Chosen plan

AQP Families

Catalog(stream rates,

data distr.)

Statistics Tracker: Monitors stream stats

and system conditions

Uses stats to cost plans

Introduction AQP Families Comparison New Ideas Conclusions

AQP in CQ-based Systems

Optimizer:Ensures that plan is best for current stats

Continuous Query

Executor:Runs chosen plan

Chosen plan

AQP Families

Catalog(stream rates,

data distr.)

Statistics Tracker: Monitors stream stats

and system conditions

Uses stats to cost plans

Introduction AQP Families Comparison New Ideas Conclusions

AQP in CQ-based SystemsContinuous Query

Executor:Runs chosen plan

Chosen plan

AQP Families

Catalog(stream rates,

data distr.)

Statistics Tracker: Monitors stream stats

and system conditions

Statsto track

Re-optimize

Combinedin-part forefficiency

Uses stats to cost plans

Optimizer:Ensures that plan is best for current stats

Introduction AQP Families Comparison New Ideas Conclusions

…1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Parametric opt.

RedBrick

DEC-Rdb

Query Scrambling

Re-Opt

Tukwila

River

DQE

Conquest

Expected cost opt.

Pipeline sch.

Memory adap.

POP

CAPE

Correctiveprocessing

EddiesNiagaraCQ

STREAM

Ingres

Example CQ-based AQP Systems

AQP Families

Introduction AQP Families Comparison New Ideas Conclusions

Primer on Routing-based Processing

• Non-plan-based architecture where tuples are routed individually through operators

• No optimizer• Exemplified by

Eddies [AH00]

AQP Families

σ1

σ2

σ3

Packets

Chosen packets

Using a plan

σ1

σ2 σ

3

Packets

Chosen packets

TupleRouter

Using tuple routing

Introduction AQP Families Comparison New Ideas Conclusions

AQP in Routing-based Systems

Executor:Runs chosen plan

Chosen plan

AQP Families

Optimizer:Chooses best plan

Query

Catalog(table sizes,histograms)

Statistics Tracker:Creates/updates stats

Runstats

Uses stats to cost plans

Introduction AQP Families Comparison New Ideas Conclusions

AQP in Routing-based Systems

Tuple Router:Integrated Optimizer

& Stats Tracker

Query or Continuous Query

AQP Families

Executor:Runs chosen plan

Chosen plan

Executor:Pool of operators

Selective routing of tuples In-memory catalog

(operator costs,selectivities, etc.)

Uses stats to choose efficient routes

Introduction AQP Families Comparison New Ideas Conclusions

…1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Parametric opt.

RedBrick

DEC-Rdb

Query Scrambling

Re-Opt

Tukwila

River

DQE

Conquest

Expected cost opt.

Pipeline sch.

Memory adap.

POP

CAPE

Correctiveprocessing

EddiesNiagaraCQ

STREAM

Ingres

Example Routing-based AQP Systems

AQP Families

Introduction AQP Families Comparison New Ideas Conclusions

Roadmap

• Introduction to AQP• The three AQP system families• Comparison across families in terms of AQP tasks• Summary of what we learned

Introduction AQP Families Comparison New Ideas Conclusions

Comparison Across AQP System Families

• Goal: To bring out AQP algorithms and features, not performance numbers

Comparison

• Models, assumptions, and approach• Techniques for tracking statistics• Re-optimization subtasks

• When and how to re-optimize• Switching between plans

• Pros & cons of using a conventional optimizer• Performance issues

• Quality of re-optimization• Run-time overhead & thrashing• Scalability

Introduction AQP Families Comparison New Ideas Conclusions

Comparison Across AQP System Families

• Goal: To bring out AQP algorithms and features, not performance numbers

Comparison

• Models, assumptions, and approach• Techniques for tracking statistics• Re-optimization subtasks

• When and how to re-optimize• Switching between plans

• Pros & cons of using a conventional optimizer• Performance issues

• Quality of re-optimization• Run-time overhead & thrashing• Scalability

Introduction AQP Families Comparison New Ideas Conclusions

Techniques for Tracking Statistics

• Observation– Mostly in Plan-based systems

• Competition– Mostly in Plan-based systems

• Profiling– Mostly in CQ-based systems

• Exploration– In Routing-based systems

Comparison

Introduction AQP Families Comparison New Ideas Conclusions

Tracking Statistics: Observation [KD98]

• Collect statistics on operator behavior or intermediate subexpressions in a plan

Comparison

σ1

σ2

σ3

Packets

Chosen packets

Selectivity of 1 oninput stream can be

observed here

Introduction AQP Families Comparison New Ideas Conclusions

Tracking Statistics: Competition [A93]

• Extra processing to collect statistics

Comparison

Packets

σ1

σ2

σ3

Chosen packets

Selectivity of

on inputstream σ

2

Selectivity of

on inputstream

Introduction AQP Families Comparison New Ideas Conclusions

Tracking Statistics: Profiling [BMM+04]

• Extra processing on a fraction of the input tuples (e.g., a random sample) to collect statistics

• Builds a “statistical profile” that can be used to estimate many individual statistics

Comparison

σ1

σ2

σ3

Profiledtuples

Introduction AQP Families Comparison New Ideas Conclusions

Tracking Statistics: Exploration [AH00]

• A fraction of tuples are routed along routes different from the current best route to track statistics along those routes

• No redundant processing

Comparison

σ1

σ2 σ

3

Packets

Chosen packets

TupleRouter

Introduction AQP Families Comparison New Ideas Conclusions

Comparing Statistics-Tracking Techniques:

Extra Overhead Introduced

Comparison

Increasingoverhead

• Observation

• Exploration (inefficient routes for some tuples)• Profiling (extra processing on some tuples)

• Competition (lots of extra work)

Introduction AQP Families Comparison New Ideas Conclusions

Comparing Statistics-Tracking Techniques:

Coverage of Different Statistics

Comparison

Increasingcoverage

• Observation & Competition (limited by plan)

• Exploration (limited by large number of routes)

• Profiling (highest since it builds statistics profile)

Introduction AQP Families Comparison New Ideas Conclusions

Comparing Statistics-Tracking Techniques:

Accuracy of Estimation

Comparison

Increasingaccuracy

• Observation & Competition

• Exploration (but, susceptible to routing bias)• Profiling (depends on sampling fraction)

Introduction AQP Families Comparison New Ideas Conclusions

Roadmap

• Introduction to AQP• The three AQP system families• Comparison across families in terms of AQP tasks• Summary of what we learned

Introduction AQP Families Comparison New Ideas Conclusions

What have we learned? (1)

• Many similarities in internals of different AQP families

• Can re-use many current (and new) AQP techniques across families

• Ex: Profiling from CQ-based systems– Enables, e.g., faster detection of plan

suboptimality in Plan-based systems– Generates more accurate statistics

at lower cost in Routing-based systems

New Ideas

Example Query:p1 and p2 (R) S ⋈

R

INLJ

Unclusteredindex

S

Introduction AQP Families Comparison New Ideas Conclusions

What have we learned? (2)• Current AQP systems are reactive

– E.g., do not consider sensitivity to errors/changes in stats

New Ideas

Example Query: p1 and p2 (R) S ⋈

|σ(R)|

Hash Join

INLJ

Cost

Proactive Re-optimization

R

S

Hash Join

R

INLJ

Unclusteredindex

S

Introduction AQP Families Comparison New Ideas Conclusions

What have we learned? (3)

• Challenging meta problems in AQP for continuous queries need to be addressed1. Larger and more complex plan spaces higher

costs for statistics tracking and re-optimization2. Tracking “Return-of-Investment” on AQP3. Avoiding thrashing, e.g., on bursty changes in

statistics

New Ideas

Proposal: Plan Logging for Continuous Queries

Introduction AQP Families Comparison New Ideas Conclusions

Plan Logging for Continuous Queries

• Log the statistics and re-optimization history– Query is long-running– Example view over log for R S TRate(R) … R,S) Plan Cost

1024 … 0.75 P112762

5642 … 0.72 P272332

934 … 0.76 P112003

⋈ ⋈

Rate(R)

R,S

) P1 P2

New Ideas

Plans lying in a high-dimensional space of statistics

time

Introduction AQP Families Comparison New Ideas Conclusions

Summary

• AQP is becoming important:– New data and application trends– CS-wide push towards Autonomic Computing– Significant amount of work on AQP in recent

years• Our contributions:

– In-depth categorization and comparison of AQP systems and techniques

– Identified current shortcomings and new approaches to AQP

Conclusions