1 a scalable algorithm for answering queries using views rachel pottinger qualifying exam october...

28
1 A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy

Post on 18-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

1

A Scalable Algorithm for Answering Queries Using Views

Rachel PottingerQualifying ExamOctober 29, 1999Advisor: Alon Levy

2

Answering Queries Using Views

Problem: access views instead of original relations

Useful in data integration and query optimization

NP-Complete Many papers on the subject No empirical testing of algorithms

3

Data Integration:Query Reformulation Data sources are pre-calculated views Views are not complete Get the most answers possible given the views Many data sources

Ford cars- dealer prices- sticker prices- inventory

Cheap cars- prices-manufacturer

Used cars- prices- dealer- year

Car sale information

4

Data Integration Example

Q(cost):-dealercost(car,cost) & stickerprice(car,cost)

V1(price1,price2):-dealercost(car, price1) &

stickerprice(car, price2) & maker(car, “Ford”)V2(cost):-dealercost(car, cost) &

stickerprice(car,cost) & cheap(car)

Q’1(cost):-Ford(cost, cost) Q’2(cost):-BMW(cost)

Conjunctive rewritings

Views

Query

Query: find the prices of cars that we can buy at cost Database relations

Maximally contained rewriting

existentialdistinguished

5

Outline

Previous algorithms Bucket Algorithm [Levy, Rajaraman, Ordille, 1996] Inverse rules [Duschka, Genesereth, 1997]

Minimum Necessary Connections (MiniCon) Algorithm

Experimental evaluation Extension to arithmetic comparisons Conclusions and future work

6

The Bucket Algorithm

Introduced as part of Information Manifold

Treats subgoals individually

7

Bucket Algorithm: Populating bucketsFor each subgoal in the query, place

relevant views in the subgoal’s bucketInputs:Q(x):- r1(x,y) & r2(y,x)

V1(a):-r1(a,b)

V2(d):-r2(c,d)

V3(f):- r1(f,g) & r2(g,f)

r1(x,y)

V1(x),V3(x)

r2(y,x)

V2(x), V3(x)

Buckets:

8

Combining Buckets

For every combination in the Cartesian products from the buckets, check containment in the query

Candidate rewritings:Q’1(x) :- V1(x) & V2(x)

Q’2(x) :- V1(x) & V3(x)

Q’3(x) :- V3(x) & V2(x)

Q’4(x) :- V3(x) & V3(x) r1(x,y)

V1(x),V3(x)

r2(y,x)

V2(x), V3(x)

Bucket Algorithm will check all possible combinations

r1(x,y)

r2(y,x)

Buckets:

9

Inverse Rules

Part of the Info Master systemInverse rules show how to get

database tuples from the viewsCannot be extended to interpreted

predicatesStops earlier than the Bucket

Algorithm

10

Creating Inverse Rules

Inputs:V1(a):-r1(a,b)

V2(d):-r2(c,d)

V3(f):- r1(f,g) & r2(g,f)

Inverse Rules:IR1 r1(a, sfV1(a)) :-V1(a)

IR2 r2(sfV2(d),d) :-V2(d)

IR3 r1(f,sfV3(f)) :-V3(f)

IR4 r2(sfV3(f),f) :-V3(f)

Skolem Function

For each V(X):-r1(X1) &… & rn(Xn)for each j = 1, …, n form an inverse rule: rj(Xj):-V(X)

11

Combining Inverse Rules

Inverse Rules +IR1 r1(a, sfV1(a)) :-V1(a)

IR2 r2(sfV2(d),d) :-V2(d)

IR3 r1(f,sfV3(f)) :-V3(f)

IR4 r2(sfV3(f),f) :-V3(f)

TuplesV1(g)

V2(h)

V3(j)

V3(m)

= Expansion:r1(g,sfV1(g)), r2(sfV2(h),h), r1(j,sfV3(j)), r2(sfV3(j),j)r1(m,sfV3(m)), r2(sfV3(m),m)

At query time, query over rules

Q(x):-r1(x,y)& r2(y,x)

Query +

12

Unfolding rules before tuples

Q(x):- r1(x,y) & r2(y,x)

IR1

IR3

IR2

IR4

Use unification to see if rewriting is contained in the query

No containment check necessary

13

The MiniCon Algorithm

Concentrate on variables rather than subgoals to create MiniCon Descriptions (MCDs)

Combine MCDs that only overlap on distinguished view variables

No containment check!

14

MiniCon Description Formation

Form all MiniCon Descriptions (MCDs) that map all query variables that have to be mapped together

Inputs:Q(x) :-r1(x,y) & r2(y,x)

V1(a):-r1(a,b)

V2(d):-r2(c,d)

V3(f):- r1(f,g) & r2(g,f)view mapping subgoals mapped

V3 x f, y g 1, 2 MCDs:

15

MiniCon Combination

Take all combinations of MCDs that map disjoint sets of subgoals map all subgoals of the query

MCDs:view mapping subgoals mapped

V3 x f, y g 1, 2

Rewriting: Q’(x):-V3(x)

16

Experimental Evaluation

Tested performance and scale up of: Bucket Algorithm Inverse Rules extended with unification MiniCon Algorithm

MiniCon at least as good in all cases, much better in some

Show results for chain queries:Q(a):-r1(a,b), r2(b,c), r3(c,d), r4(d,e)

17

Many Rewritings

Chain queries with 5 subgoals and all variables distinguished

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11

Number of Views

Tim

e (

se

c)

MiniCon

Inverse

Bucket

18

Few rewritings, very structured query and views

Chain queries with 10 subgoals and 2 distinguished variables

0

0.5

1

1.5

2

0 100 200 300 400

Number of Views

Tim

e (s

ec)

MiniCon

Inverse

Bucket

19

Few rewritings, less structured views

Chain queries; 2 variables distinguished, query of length 12, views of lengths 2, 3, and 4

0

0.5

1

1.5

2

0 50 100 150

Number of Views

Tim

e (

se

c)

Minicon

Inverse

20

Extension:Interpreted Predicates

Problem is in general undecidable We looked at subgoals of the form:

var < constant or var > constantIf maps to an existential view variable,

require interpreted predicates impliedEx: Q(x):-r1(x,y), y > 17

V1(a):-r1(a,b), b > 18

Guaranteed to be sound

Interpreted Predicates

21

Interpreted Predicate Results

Chain queries with all variables distinguished, 5 subgoals, and 5

variables constrained

012345678

1 2 3 4 5 6 7 8 9

Number of Views

Tim

e (s

ec)

MiniCon IP

Minicon

Chain queries with two distinguished variables, 10 subgoals, and 5 variables

constrained

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400

Number of Views

Tim

e (

se

c) MiniCon IP

MiniCon

22

Future Work

Query OptimizationLook for the fastest answer to queryAssume that all views are completeRequire equivalent rewritingsNeed to allow overlap on subgoals

mapped A fuller comparison of interpreted

predicates

23

Conclusions

Scalability of previous algorithms understood

MiniCon Algorithm invented First experimental comparison of algorithms

for answering queries using views Extensions to binding patterns, interpreted

predicates New maximally contained rewriting form

24

Maximally contained Rewritings

Q’ is a maximally contained rewriting of a query Q using the views V = V1, …, Vn ifFor any database D, and extensions v1, …, vn of

the views such that vi Vi(D), 1 i n, then Q’(v1, …, v2) Q(D) for all i

There is no other query Q1 such that Q’(v1, …, vn) Q1(v1, …, vn)

(2) Q1(v1, …, vn) Q(D), and there exists at least one database for which is a strict subset

25

Containment Checks

Q1 Q2 if the answer to Q1 is a subset of Q2

m is a containment mapping from Vars(Q2) to Vars(Q1) ifm maps every subgoal in the body of Q2

to a subgoal in the body of Q1

m maps the head of Q2 to the head of Q1

26

Inverse Rules With Unification

Find all Inverse Rules that match each query subgoal; place in bucket for that subgoal

For each rule in the first bucket For each other subgoal, i, attempt to unify the

rules so far with all elements in the bucket for IIf we cannot unify with anything in that

bucket, break out of loop, otherwise, recurse

27

Correctness requirements

We need both soundness and completeness A sound rewriting has a valid

containment mapping from the variables of the query to the variables of the view

For completeness we need only to check rewritings of length less than or equal to that of the query

28

Extensions to XML

Need to choose a query language Containment checks should still hold Need to check to make sure that

restructured elements are distinguished

May even be more scalable vs Inverse Rules, Bucket Algorithm