optimizing refresh of a set of materialized views

41
Optimizing Refresh of a Set of Materialized Views N. Folkert, A. Gupta, A. Witkowski, S. Subramanian, S. Bellamkonda, S. Shankar, T. Bozkaya, L. Sheng Oracle Corporation

Upload: temira

Post on 11-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Optimizing Refresh of a Set of Materialized Views. N. Folkert, A. Gupta, A. Witkowski, S. Subramanian, S. Bellamkonda, S. Shankar, T. Bozkaya, L. Sheng. Oracle Corporation. Overview. Shapes of MVs for analysis: Rollup & Federated Cubes Improvements in refresh of a single MV - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimizing Refresh of a Set of Materialized Views

Optimizing Refresh of a Set of Materialized Views

N. Folkert, A. Gupta, A. Witkowski, S. Subramanian, S. Bellamkonda, S. Shankar, T. Bozkaya, L. Sheng

Oracle Corporation

Page 2: Optimizing Refresh of a Set of Materialized Views

Overview Shapes of MVs for analysis: Rollup &

Federated Cubes Improvements in refresh of a single MV Scheduling refresh of a set of MVs

Page 3: Optimizing Refresh of a Set of Materialized Views

Shapes of MV for Analysis Queries

– Ask for aggregations on different hierarchy levels

– Rank, using window functions, within partitions– Rewrite a major performance tool– Hence, we need materialized views

representing cubes DW Maintenance of schema

– Mostly partition based (add, drop, exchange, truncate)

– Corrections done using DML touching few partitions

– Hence, we need partition based refresh

Page 4: Optimizing Refresh of a Set of Materialized Views

Example – Hierachical Star Schema– Sales(city, day, amt) – fact table. Assume partition by time

– Times(day, month, quart, year) – hierarchical dimension– Geog(city, state, region)- hierarchical dimension

Page 5: Optimizing Refresh of a Set of Materialized Views

ROLLUP CUBE – A grouping for each level, in each hierarchy– All groupings in a single MV– Easy to Manage– Variation where as many cubes as time attributes

CREATE MATERIALIZED VIEW rollup_cube_MV ASSELECT t.year, t.quart, t.month, t.day, g.region, g.state, g.city, sum(s.amt) amt FROM sales s, geog g, times t WHERE t.day = s.day & g.city = t.city GROUP BY ROLLUP (t.year, t.quart, t.month, t.day) ROLLUP (g.region, g.state, g.city)

Page 6: Optimizing Refresh of a Set of Materialized Views

ROLLUP CUBE – A grouping for each level, in each hierarchy– All groupings in a single MV– Easy to Manage– Variation where as many cubes as time attributes

CREATE MATERIALIZED VIEW year_rollup_cube_MV ASSELECT t.year, g.region, g.state, g.city, sum(s.amt) amt FROM sales s, geog g, times t WHERE t.day = s.day & g.city = t.city GROUP BY year, ROLLUP (g.region, g.state, g.city)

CREATE MATERIALIZED VIEW quart_rollup_cube_MV ASSELECT t.quart, g.region, g.state, g.city, sum(s.amt) amt FROM sales s, geog g, times t WHERE t.day = s.day & g.city = t.city GROUP BY quart, ROLLUP (g.region, g.state, g.city)

Page 7: Optimizing Refresh of a Set of Materialized Views

Federated Cube– Each groupings in a separate MV– MVs with a time attribute candidates for partitioning– Flexibility of partitioning– Indexes smaller than ROLLUP CUBE (no NULL

indexing)– Flexible “MV_size – query response” trade-off

CREATE MATERIALIZED VIEW quart_state_MV ASPARTITION BY RANGE (quart) ( PARTITION VALUES LESS THEN ‘Q1 03’ PARTITION VALUES LESS THEN ‘Q2 03’……)SELECT t.quart, g.state, sum(s.amt) amt FROM sales s, geog g, times t WHERE t.day = s.day & g.city = t.city GROUP BY t.quart, g.state

Page 8: Optimizing Refresh of a Set of Materialized Views

Intuition for Partition refresh

Jan

03Fe

b 03

Mar

03

Apr

03

Oct

04

Nov

04

Dec

04

Q1

03Q

2 03

Q3

04Q

4 04

2003

2004

Q3

03Q

4 03

Q1

04Q

2 02

… Sales PBY month on day

quart_state_MV PBY quart

year_state_MV PBY quart

fact

MV

MV

Page 9: Optimizing Refresh of a Set of Materialized Views

Intuition for Partition refresh - Drop

Jan

03Fe

b 03

Mar

03

Apr

03

Oct

04

Nov

04

Dec

04

Q1

03Q

2 03

Q3

04Q

4 04

2003

2004

Q3

03Q

4 03

Q1

04Q

2 02

… Sales PBY month on day

quart_state_MV PBY quart

year_state_MV PBY quart

fact

MV

MV

Page 10: Optimizing Refresh of a Set of Materialized Views

Intuition for Partition refresh – Add

Feb

03M

ar 0

3A

pr 0

3

Oct

04

Nov

04

Dec

04

Q1

03Q

2 03

Q3

04Q

4 04

2003

2004

Q3

03Q

4 03

Q1

04Q

2 02

… Sales PBY month on day

quart_state_MV PBY quart

year_state_MV PBY quartJa

n 05

fac t

MV

MV

Page 11: Optimizing Refresh of a Set of Materialized Views

Intuition for MV refresh – Maintain

Feb

03M

ar 0

3A

pr 0

3

Oct

04

Nov

04

Dec

04

Q1

03Q

2 03

Q3

04Q

4 04

2003

2004

Q3

03Q

4 03

Q1

04Q

2 02

… Sales PBY month on day

quart_state_MV PBY quart

year_state_MV PBY quartJa

n 05

Q1

05

2005

fac t

MV

MV

Page 12: Optimizing Refresh of a Set of Materialized Views

Refresh Expressions Conventional Refresh Expressions Partition Based Refresh Expressions

Page 13: Optimizing Refresh of a Set of Materialized Views

Conventional Refresh – Use MV log

Jan

03Fe

b 03

Mar

03

Q1

03

delta_sales s

Maintenance Steps• Delete delta logged in MV log

delta

sal

e

…fac t

MV

mv

log

Page 14: Optimizing Refresh of a Set of Materialized Views

Conventional Refresh – Use MV log

Feb

03M

ar 0

3Q

1 03

(SELECT t.quart, g.state, sum(amt) amt FROM delta_sales s, geog g, times t WHERE s.day = t.day & s.city = g.city GROUP BY quart, state) delta

Maintenance Steps• Delete delta logged in MV log• Calculate aggregate on the delta

delta

sal

e

…fac t

MV

mv

log

Page 15: Optimizing Refresh of a Set of Materialized Views

Conventional Refresh – Use MV log

Feb

03M

ar 0

3Q

1 03

UPDATE (SELECT m.quart, m.state, m.amt, delta.amt delta_amt FROM quart_state_MV m, (SELECT t.quart, g.state, sum(amt) amt FROM delta_sales s, geog g, times t WHERE s.day = t.day & s.city = g.city GBY..) delta WHERE m.quart = delta.quart & m.state=delta.state) SET amt = amt – delta_amt

Maintenance Steps• Delete delta logged in MV log• Calculate aggregate on the delta• Join MV with the delta MV• Update MV using delta MV

delta

sal

e

…fac t

MV

mv

log

Page 16: Optimizing Refresh of a Set of Materialized Views

Partition Change Tracking The base fact table is partitioned RDBMS keeps track of affected partitions

– After partition operations (truncate, add, exchange, etc.)– After a DML (Insert, Delete, Update) to a partition

Can use instead of MV logs, but with high granularity

Jan

03Fe

b 03

Mar

03

Apr

03

Oct

04

Nov

04

Dec

04

Single row deleteTruncate partition

Page 17: Optimizing Refresh of a Set of Materialized Views

Partition Based RefreshJa

n 03

Feb

03M

ar 0

3Q

1 03

CREATE MATERIALIZED VIEW quart_state_MV ASPARTITION BY RANGE (quart) ( PARTITION VALUES LESS THEN ‘Q1 03’ PARTITION VALUES LESS THEN ‘Q2 03’……)SELECT t.quart, g.state, sum(s.amt) amt FROM sales s, geog g, times t WHERE t.day = s.day & g.city = t.city GROUP BY quart, state

• Fact table partitioned• MV partitioned • MV partitions determined by Fact partitions • Partition – join dependent columns in MV

Dec

04

Q1

04…fact partition columnmv partition column

fact

MV

Page 18: Optimizing Refresh of a Set of Materialized Views

Partition Based RefreshJa

n 03

Feb

03M

ar 0

3Q

1 03

Maintenance Steps• Determine Affected MV partitions

Dec

04

Q1

04

01-0

1-20

03

01-0

2-20

03

(SELECT t.quart FROM times t WHERE t.day >= ’01-01-2003’ & t.day < ’02-01-2003’)

fac t

MV

Page 19: Optimizing Refresh of a Set of Materialized Views

Partition Based RefreshFe

b 03

Mar

03

Maintenance Steps• Determine Affected MV partitions• Delete / Truncate the partitions

Dec

04

Q1

04

DELETE quart_state_MVWHERE quart IN (SELECT t.quart FROM times t WHERE t.day >= ’01-01-2003’ & t.day < ’02-01-2003’)

fac t

MV

Page 20: Optimizing Refresh of a Set of Materialized Views

Partition Based RefreshFe

b 03

Mar

03

INSERT INTO quart_state_MVSELECT t.quart, g.state, sum(s.amt) amtFROM sales s, geog g, times tWHERE t.day = s.day & g.city = t.city t.quart IN (SELECT t.quart FROM times t WHERE t.day >= ’01-01-2003’ & t.day < ’02-01-2003’)GROUP BY quart, state

Maintenance Steps• Determine Affected MV partitions• Delete / Truncate the partitions• Insert recalculated partitions

Dec

04

Q1

04

Q1

03

fac t

MV

Page 21: Optimizing Refresh of a Set of Materialized Views

Rewrite During RefreshINSERT INTO year_state_MVSELECT t.year, g.state, sum(s.amt) amtFROM sales s, geog g, times tWHERE t.day = s.day & g.city = t.city t.year IN (SELECT t.year FROM times t WHERE t.day >= ’01-01-2003’ & t.day < ’02-01-2003’)GROUP BY year, state

2003

2004

Feb

03Ja

n 03

Dec

03

MV

fac t

Page 22: Optimizing Refresh of a Set of Materialized Views

Rewrite During RefreshINSERT INTO year_state_MVSELECT t.year, g.state, sum(s.amt) amtFROM sales s, geog g, times tWHERE t.day = s.day & g.city = t.city t.year IN (SELECT t.year FROM times t WHERE t.day >= ’01-01-2003’ & t.day < ’02-01-2003’)GROUP BY year, state

Q2

03

INSERT INTO year_state_MVSELECT t.year, mv.state, sum(mv.amt) amtFROM quart_state_mv, (SELECT DISTINCT quart, year FROM t) tWHERE mv.quart = t.quart mv.quart IN (SELECT t.quart FROM times t WHERE t.day >=’01-01-2003’ & t.day < ’02-01-2003’)GROUP BY year, state

2003

2004

Q1

03

Q3

03Fe

b 03

Jan

03

Dec

03

MV

MV

fac t

Page 23: Optimizing Refresh of a Set of Materialized Views

Efficiency of Partition Refresh Truncate is very fast. No undo logs, just dictionary op

INSERT into an MV partition uses fast path with no logging, faster then conventional INSERT or UPDATE

Scan of a base partition may be more efficient than scan of MV logs (as they may have to keep more data)

If more than one partition of MV affected, we use MULTI-TABLE INSERT instead of multiple INSERTs. Scans the base data once

We use new optimization - Dynamic Partition Pruning

Generated refresh expressions automatically rewritten against already fresh MVs. Refresh expressions more suitable for rewrite.

Page 24: Optimizing Refresh of a Set of Materialized Views

Choosing Optimal Refresh Method Partition Based Refresh may remove and recalculate more data in MV than conventional refresh

Cost based generation of refresh expressions

– Conventional refresh expressions using MV logs

– Partition Refresh with DELETE from MV– Partition Refresh with TRUNCATE of MV

partition Use Optimizer to estimate their cost and

chose the cheapest method

Page 25: Optimizing Refresh of a Set of Materialized Views

Building Refresh Schedule Usage of rewrite for schedule building Resource allocation for concurrent execution

of the schedule

Page 26: Optimizing Refresh of a Set of Materialized Views

Rewrite Schedule – Rewrite GraphBuild MV Dependency Graph

• Nodes = Set of MVs to refresh

day_city_mv

month_city_mv

year_region_mv

day_state_mv

day_region_mvquart_city_mv

Page 27: Optimizing Refresh of a Set of Materialized Views

Rewrite Schedule – Rewrite GraphBuild MV Dependency Graph

• Nodes = Set of MVs to refresh• Edges = Optimizer determined best rewrite for MV

day_city_mv

month_city_mv day_state_mv

day_region_mvquart_city_mv

…cost=10

year_region_mv

Page 28: Optimizing Refresh of a Set of Materialized Views

Rewrite Schedule – Rewrite GraphBuild MV Dependency Graph

• Nodes = Set of MVs to refresh• Edges = Optimizer determined best rewrite for MV

day_city_mv

month_city_mv day_state_mv

day_region_mvquart_city_mv

cost=100cost=10

year_region_mv

Page 29: Optimizing Refresh of a Set of Materialized Views

Rewrite Schedule – Rewrite GraphBuild MV Dependency Graph

• Nodes = Set of MVs to refresh• Edges = Optimizer determines best rewrite for MV

day_city_mv

month_city_mv day_state_mv

day_region_mvquart_city_mv

…cost=10

year_region_mv

Page 30: Optimizing Refresh of a Set of Materialized Views

Rewrite Schedule – Rewrite GraphBuild MV Dependency Graph

• Nodes = Set of MVs to refresh• Edges = Optimizer determines best rewrite for MV• Result – acyclic graph with cost. (Cycles removed using SSC)

day_city_mv

month_city_mv day_state_mv

day_region_mvquart_city_mv

cost=100

cost=10

cost=20cost=40

year_region_mv

Page 31: Optimizing Refresh of a Set of Materialized Views

Execution of Schedule - ResourcesAssign resources (processes for now) to the graph

• If cost > threshold, assign all processes to the refresh

cost=100. (3)

cost=10

cost=20cost=40

cost=20

cost=10

11 11

4

2 3

6

5

processors

Page 32: Optimizing Refresh of a Set of Materialized Views

Execution of Schedule - ResourcesAssign resources (processes for now) to the graph

• If cost > threshold, assign all processes to the refresh•Otherwise, assign processors proportionally based on cost

cost=100. (3)

cost=10

cost=20. (1)cost=40. (2)

cost=20

cost=10

11 11

4

2 3

6

5

processors

322

Page 33: Optimizing Refresh of a Set of Materialized Views

Execution of Schedule - ResourcesAssign resources (processes for now) to the graph

• If cost > threshold, assign all processes to the refresh•Otherwise, assign processors proportionally based on cost

cost=100. (3)

cost=10. (1)

cost=20. (1)cost=40. (2)

cost=20. (2)

cost=10

11 11

4

2 3

6

5

processors

322

544

Page 34: Optimizing Refresh of a Set of Materialized Views

Execution of Schedule - ResourcesAssign resources (processes for now) to the graph

• If cost > threshold, assign all processes to the refresh•Otherwise, assign processors proportionally based on cost

cost=100. (3)

cost=10. (1)

cost=20. (1)cost=40. (2)

cost=20. (1)

cost=10

11 11

4

2 3

6

5

processors

322

354

Page 35: Optimizing Refresh of a Set of Materialized Views

Execution of Schedule - ResourcesAssign resources (processes for now) to the graph

• If cost > threshold, assign all processes to the refresh•Otherwise, assign processors proportionally based on cost

cost=100. (3)

cost=10. (1)

cost=20. (1)cost=40. (2)

cost=20. (2)

cost=10

11 11

4

2 3

6

5

processors

322

544

666

Page 36: Optimizing Refresh of a Set of Materialized Views

Execution of Schedule - OthersAssign resources (processes for now) to the graph

• How to get cost and thus schedule for refresh of empty MVs, • Use query defining the MV to get an estimate of its cost

cost=100

cost=10

cost=20cost=40

cost=20

cost=10

Page 37: Optimizing Refresh of a Set of Materialized Views

Performance APB schema with 5.0 density

– Fact with four hierarchical dimensions – 62 M rows– Dimensions (channel (2 levels), time (3), cust (3), prod

(7))– Federated full cube with 350 M rows

System 24 CPUs with total of 24 GB shared memory

Complete refresh– 6 times performance improvement over Oracle 9i

Incremental refresh – 1 month (3.5 M rows) added– 5 times performance improvement over Oracle 9i

Page 38: Optimizing Refresh of a Set of Materialized Views

Partition Based vs Conventional Refresh

50

100

150

200

250

300

350

0 1000 3000

400

Rows inserted

Incremental refresh time

2000

Conventional

Partition Based with Delete

Partition Based with Truncate

Page 39: Optimizing Refresh of a Set of Materialized Views

Summary Improvements in refresh of a single MV

– Algorithms use partitioning of based tables and/or MVs

– Costing of Multiple refresh methods using Optimizer

– Dynamic Partition Pruning– MV rewrite during refresh

Scheduling refresh of a set of MVs– Usage of rewrite for schedule building– Resource allocation for concurrent execution of

the schedule

Page 40: Optimizing Refresh of a Set of Materialized Views

Summary MVs can be partitioned for more efficient

refresh Partition based refresh very suitable for

cubes partitioned by time granules (good rewrites)

Federated cube offers good compromise size vs refresh & rewrite time if not too many groupings (less then 100)

Need better rewrite strategy for Federated cubes with many groupings (optimizer needs to do intelligent pruning of mvs)

Page 41: Optimizing Refresh of a Set of Materialized Views

AQ&Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S