optimizing queries materialized views

21
Optimizing Queries Using Materialized Views Qiang Wang CS848

Upload: others

Post on 16-Apr-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimizing Queries Materialized Views

Optimizing Queries Using Materialized Views

Qiang WangCS848

Page 2: Optimizing Queries Materialized Views

Answering queries using views

Cost-based rewriting(query optimization and physical data independence)

Logical rewriting(data integration and ontology integration)

System-R style Transformational

approaches

Rewriting algorithms

Query answering algorithms

Page 3: Optimizing Queries Materialized Views

Transformational approach

Transformation-based optimizer Related work

Oracle 8i DBMSIBM DB2Microsoft SQL Server

Page 4: Optimizing Queries Materialized Views

A practical and scalable solutionMicrosoft SQL Server 2000Limitations on the materialized view

A single-level SQL statement containing selections, inner joins and an optional group-byIt must reference base tablesAggregation view must output all grouping columns and a count columnAggregation functions are limited to SUM and COUNT

A solution by Goldstein and LarsonAn efficient view-matching algorithmA novel index structure on view definitions

Page 5: Optimizing Queries Materialized Views

Problem definitionView matching with single-view substitution: Given a relational expression in SPJG form, find all materialized views from which the expression can be computed and, for each view found, construct a substitute expression equivalent to the given expression.

Page 6: Optimizing Queries Materialized Views

Create view v1 asselect p_partkey,p_name,p_retailprice,

count_big(*) as cnt

sum(l_extendedprice*l_quantity) as

gross_revenue

from dbo.lineitem, dbo.part

where p_partkey < 1000

and p_partkey = l_partkey

and p_name like ‘%steel%’

group by p_partkey,p_name,p_retailprice

Create unique clustered index v1_cidx on v1(p_partkey)

select p_partkey,p_name,

sum(l_extendedprice*l_quantity),

from dbo.lineitem, dbo.part

where p_partkey < 500

and p_partkey = l_partkey

and p_name like ‘%steel%’

group by p_partkey,p_retailprice

Query

Materialized View

Page 7: Optimizing Queries Materialized Views

substitutability requirements

1. The view contains all rows needed by the query expression

2. All required rows can be selected from the view

3. All output expressions can be computed from the output of the view

4. All output rows occur with the correct duplication factor

Page 8: Optimizing Queries Materialized Views

Do all required rows exist in the view(1)Check whether the source tables in the view are superset of the source tables in the queryCheck whether the query selection predicates can imply the view selection predicates

)()()( vqvqv PUPUPEPRPRPEPEPE qqq ⇒∧∧⇒∧∧⇒

)()()( vqqvqqvqq PUPUPRPEPRPUPRPEPEPUPRPE qqq ⇒∧∧∧⇒∧∧∧⇒∧∧

vvvqq PUPRPEPUPRPEq ∧∧⇒∧∧

Equijoin subsumption test

Range subsumption test

Residual subsumption test

Page 9: Optimizing Queries Materialized Views

Equijoin subsumption test

TestStep1: equivalence class construction

{p_partkey,l_partkey} as Vc1 and Qc1Step2: check whether every nontrivial view equivalence class is a subset of some query equivalence class

Compensating equality constraintsMissing cases

Page 10: Optimizing Queries Materialized Views

Range subsumption test

TestAssociate (view/query) equivalence classes with lower and upper bound according to the range predicates; uninitialized bounds are assigned infinitum

Upperbound(Vc1) = 1000 Upperbound(Qc1) = 500

Check whether the view is more tightly constrained than the query on ranges

Compensating range constraintsMissing cases

Page 11: Optimizing Queries Materialized Views

Residual subsumption test

Test (a shallow matching algorithm)An expression is represented by a text string and a list of equivalence class referenced

Text string: like ‘%steel%’Attached equivalence class: Vc1

Check whether every conjunct in the view residual predicate matches a conjunct in the query residual predicate

Compensating residual constraintsMissing cases

Page 12: Optimizing Queries Materialized Views

Can the required rows be selected (2)

Check whether the view covers output columns for compensating equality, range, and residual predicates

Page 13: Optimizing Queries Materialized Views

Can output expressions be computed (3)

Check whether all output expressions of the query can be computed from the view

“l_expendedprice * l_price”

Page 14: Optimizing Queries Materialized Views

Do rows occur with correct duplication factor(4)

Cardinality-preserving joins (table extension joins)A stronger condition is employed: an equijoin between all columns in a non-null foreign key in T and a unique key in S

Page 15: Optimizing Queries Materialized Views

TestGiven the query references tables T1,…Tn and the view reference tables T1,…Tn,Tn+1,…Tn+m

Construct foreign key join graphBuild cardinality-preserving relationships among tablesCheck whether the extra tables can be eliminated

Compensation (not needed)

Page 16: Optimizing Queries Materialized Views

Substitution with aggregation views

Special requirementsThe view contains no aggregation or is less aggregated than the query

Exact subsumptionFunctionally determination

All columns required to perform further grouping are available in the view output

COUNT,SUM,AVGCompensation

Add extra grouping and aggregation expression computation

Page 17: Optimizing Queries Materialized Views

A clever index ---- filter treeA1 An…..

….. ….. …..

G1’ Gm’…..G1 Gm…..

….. View definition listView definition list

Page 18: Optimizing Queries Materialized Views

Internal structure: lattice index

Tops

Roots

Q:{A,B}

superset

subset

{A,B,C} {A,B,F} {B,C,D,E}

{A,B}

{A} {B}

{B,E}

{D}

Page 19: Optimizing Queries Materialized Views

Partitioning conditions

Hub conditionSource table conditionOutput expressionOutput columnResidual constraintRange constraintGrouping expressionGrouping column

Page 20: Optimizing Queries Materialized Views

Experiment & extension work

ExperimentConcerning 1000 views, candidate set is reduced to less than 0.4% of the views on average; Optimization time is 0.15 second per query where around 10 different substitutions are generated

ExtensionConstraint test refinementRelaxation on single-table substitutes

Page 21: Optimizing Queries Materialized Views

Consideration

How about a consolidated lattice structure over views?

Construction && maintenanceIntroducing DL for reasoning, but how about the aggregation support?