view usability and safety for the answering of top- k queries via materialized views

35
View Usability and Safety for the Answering of Top-k Queries via Materialized Views Eftychia Baikousi Panos Vassiliadis University of Ioannina Dept. of Computer Science

Upload: lovey

Post on 12-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

University of Ioannina Dept. of Computer Science. View Usability and Safety for the Answering of Top- k Queries via Materialized Views. Eftychia Baikousi Panos Vassiliadis. Forecast. Problem of answering a top- k query through materialized top- n views - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

View Usability and Safety for the Answering of Top-k Queries

via Materialized Views

Eftychia BaikousiPanos Vassiliadis

University of Ioannina

Dept. of Computer Science

Page 2: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 2

Forecast

Problem of answering a top-k query through materialized top-n views Theoretical guarantees when a top-n materialized

view can answer a top-k query Algorithmic techniques for answering a top-k

query from a materialized view Properties of the safe areas of views

Page 3: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 3

Contents

Motivation & Problem Definition

Overview of the Method Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions

Page 4: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 4

Contents

Motivation & Problem Definition

Overview of the Method Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions

Page 5: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 5

Top-k query

Given a relation R (id, x1, x2, x3) and a query Q, sum(x1, x2, x3)

Find k tuples with highest grades according to Q

id x1 x2 x3

a 0.3 0.6 0.7

b 0.2 0.3 0.4

c 0.4 0.5 0.9

d 0.7 0.6 0.1

R

Top-2 tuples

sum

1.6

0.9

1.8

1.4

Page 6: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 6

Motivating Example

Given a relation Region (id, name, today_traffic, yesterday_traffic, budget, ..) a materialized view V of top-2 regions according to the query

Q: 0.6*difftraffic + 0.4*budget

id Name t_traffic y_traffic budget V

1 LA 18 20 21 7.2

2 NY 42 54 15 -1.2

3 Dallas 26 22 8 4.4

4 Chicago 30 28 11 5.6

name V

LA 7.2

Dallas 4.4

Region V

Telecommunication Company Executives see sale reports in PDAs

Can a new top-k query (e.g. 0.5*difftraffic + 0.3*budget)be answered from V ?

Page 7: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 7

Problem definition Given

a base relation R (ID, X, Y) a materialized view V (ID, X, Y, s)

that contains top-n tuples of the form (id, s) where s is defined as

s = w (a·x + y) and w, a are positive parameters

a query Q (ID, X, Y, sQ )that requests for top k ≤ n tuples of the form (id, sQ) where sQ is defined as

sQ = wQ (aQ·x + y) and wQ, aQ are positive parameters

Introduce an algorithm

that decides whether V by itself is suitable to answer Q and compute Q’s answer

Page 8: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 8

Related Work Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis :

“Answering Top-k Queries Using Views”, VLDB ’06

Answer top-k query Q by making use of ranking views V

LPTA in 2-steps SelectViews (V, Q)

Selects efficient subset of views U for answering Q, U contains the sorted lists over each attribute of the relation

Answer Q from U Linear programming adaptation of TA algorithm Stopping condition : solution of linear program ≤ min (top-k)

Page 9: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 9

Related Work –Geometric Representation (0)

Assume Relation R (ID, X, Y) Two views Vu( id, Score1)

and Vd( id, Score2) Query Q( id, Score)

Scoring functions of the form Score = w ( a·x +y)

Depicted as y = a-1·x

Page 10: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 10

Related Work – Geometric Representation (1)

M : the kth tuple in Q

Stopping condition: sweeping line ( ) crosses position A1B

Any point below line AB has smaller score than M in regards to Q

Page 11: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 11

Related Work – Geometric Representation (2)

Stopping condition: intersection point S of sweeping lines ( , ) lies on line AB

Any point below line AB has smaller score than M in regards to Q

Page 12: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 12

Related Work SelectViews (V,Q) is Data dependant

based on estimation of the last tuple of Q according to the data distribution

No theoretically established guarantees that the set of views will answer Q

Page 13: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 13

Contents

Motivation & Problem Definition

Overview of the Method Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions

Page 14: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 14

Overview of the method

1. Theoretical guarantees of Answering a query Q via a view VU

2. Theoretical guarantees are too strict

3. Parallelism of safe areas

Page 15: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 15

Example

id x y V

a 7 4 15

b 2 7 16

c 4 2 8

d 1 1 3

Q

18

11

10

3

R

V top-3 with score x+2y Q top-1 with score 2x+y

Page 16: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 16

Construction of safe area

VU(ID, X, Y, sU) Containing top n tuples with score sU=wU(aU·x+y)

tN the nth tuple in VU

LU :xNUyNU line perpendicular to VU passing from tN and meeting axes X and Y

LQ:xNUyQ line perpendicular to Q passing from xNU

Page 17: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 17

Safe area

Safe area defined as the area “above” line LQ

(shaded area) Observations

Any tuple in safe area has score (in regards to Q) higher than any tuple outside the safe area

Tuples in safe area belong in both VU and Q

Page 18: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 18

Answering Q from VU

THEOREM 1

VU can answer Q if safe area contains at least k tuples

Inverse does not always hold

Page 19: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 19

Overview of the method

1. Theoretical guarantees of Answering a query Q via a view VU

2. Theoretical guarantees are too strict

3. Parallelism of safe areas

Page 20: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 20

Answering Q from VU cont.

THEOREM 2It is possible that VU can answer Q if safe area contains less than k tuples

This holds when:area defined by (yellow triangle) line LU, X-axis and line L1 producing the

lowest possible score for Q from tuples of VU

Is void of tuples

Page 21: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 21

Algorithm TestViewSuitability Three main steps

Step 1: Compute safe area (Q, V)

Step 2: Count tuples in V that belong in the safe area

Step 3: If there are more than k, then return (true)Else return (false)

Page 22: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 22

Overview of the method

1. Theoretical guarantees of Answering a query Q via a view VU

2. Theoretical guarantees are too strict

3. Parallelism of safe areas

Page 23: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 23

Combining two views

Lines LQU , LQD Q characterizing the safe

areas for VU and VD

LQU ║ LQD

safe area of one view (VU ) encompassed in safe area of the other view (VD)

Page 24: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 25

Contents

Motivation & Problem Definition

Overview of the Method Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions

Page 25: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 26

Experimental methodology Test the following methods

Our algorithm

TA algorithm (it can guarantee view usability correctness)

For the following goals Effectiveness

Number of queries answered by views

Efficiency

Time savings from usage of queries

Page 26: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 27

Experimental methodology

Synthetic data sets: Random data sets of different sizes for a relation of the form

R (ID, X, Y) Sequence of queries with random coefficients and result size k

Size of source table R (tuples) |R| 1x104, 5x104, 1x105

Max size of mat. View (tuples) k 10, 50, 100, 500, 1000

Number of queries asked |Q| 100, 1000

Experimental parameters:

Page 27: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 28

Effectiveness Percentage of views used for 100 queries

Page 28: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 29

Effectiveness Percentage of views used for different time spans

Page 29: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 30

Efficiency Time savings from the usage of queries for different database sizes

and requested results Conflicting case The number of stored

results rises, while the savings drop

Due to the size of used memory Memory allocation

becomes slow Probably one view is

able to answer lot of queries

Savings increase for reasonable k’s of size 0.1%

Page 30: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 31

Contents

Motivation & Problem Definition

Overview of the Method Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions

Page 31: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 32

Conclusions

We have provided theoretical and algorithmic results for the problem of answering top-k queries via materialized views

Theoretical – algorithmic results: Theorem1: Theoretical guarantees for a view to

answer a top-k query, Theorem2: Strictness of Theorem1 Parallelism of safe areas

Page 32: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 33

Contents

Motivation & Problem Definition

Overview of the Method

Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions

Page 33: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 34

Future Work

Optimization in case of time and storage constraints View Caching

Hierarchical structures for the set of views

Sorting techniques

Page 34: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 35

Thank you for your attention!

… many thanks to our hosts!

Page 35: View Usability and Safety for the Answering of Top- k  Queries via Materialized Views

DOLAP 2009, Hong Kong, 6 Nov 2009 36

Auxiliary Time Savings