answering top-k queries using views

30
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris Tsirogiannis (Univ. of Toronto)

Upload: tolla

Post on 16-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Answering Top-k Queries Using Views. Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris Tsirogiannis (Univ. of Toronto). Introduction. Preferences expressed as scoring functions on the attributes of a relation, e.g. R. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Answering Top-k Queries Using Views

Answering Top-k Queries Using Views

Gautam Das (Univ. of Texas),

Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto),

Dimitris Tsirogiannis (Univ. of Toronto)

Page 2: Answering Top-k Queries Using Views

VLDB '06

Introduction

Preferences expressed as scoring functions on the attributes of a relation, e.g

tid X1 X2 X3

1 82 1 59

2 53 19 83

3 29 99 15

4 80 45 8

5 28 32 39€

fQ

tid Score

2 612

1 543

4 370

3 360

5 343

Top-k: k tuples with the highest score

fQ = 3X1 + 2X2 + 5X3

R

Page 3: Answering Top-k Queries Using Views

VLDB '06

Related Work

TA [Fagin et. al. ‘96] Deterministic stopping condition Always the correct top-k set

PREFER [Hristidis et. al. ‘01] Stores multiple copies of base relation R Utilizes only one

We complement existing approaches

Page 4: Answering Top-k Queries Using Views

VLDB '06

Motivation Query answering using views Space-Performance tradeoff Improved efficiency Can we exploit the same tradeoffs for

top-k query answering?

Page 5: Answering Top-k Queries Using Views

VLDB '06

Problem Statement

fQ = 3X1 + 2X2 + 5X3V1 tid Score

3 553

4 385

5 216

2 201

1 169

fV1 = 2X1 + 5X2

V2 tid Score

2 351

1 237

5 177

3 159

4 88

fV 2 = X2 + 4X3

R tid X1 X2 X3

1 82 1 59

2 53 19 83

3 29 99 15

4 80 45 8

5 28 32 39

Ranking Views: Materialized results of previously asked top-k queriesProblem: Can we answer new ad-hoc top-k queriesefficiently using ranking views?

Page 6: Answering Top-k Queries Using Views

VLDB '06

Outline LPTA Algorithm View Selection Problem

Cost Estimation Framework View Selection Algorithms

Experimental Evaluation Conclusions

Page 7: Answering Top-k Queries Using Views

VLDB '06

LPTA - Setting Linear additive scoring functions e.g.

Set of Views: Materialized result of a previously executed

top-k query Arbitrary subset of attributes Sorted access on pairs

Random access on the base table R

tid,scoreQ tid( )( )

fQ = 3X1 + 2X2 + 5X3

Page 8: Answering Top-k Queries Using Views

VLDB '06

LPTA - Example

tid11

s11

tid21

tid31

tid41

tid51€

s21

s31

s41

s51

tid12

s12

tid22

tid32

tid42

tid52€

s22

s32

s42

s52

V1 V2

tid11

tid12

Top-1V1

V2

Qstoppingcondition

X1

X2

R(X1, X2)

O = (0,0)

P = (1,0)

R = (1,1)

T = (0,1)

Page 9: Answering Top-k Queries Using Views

VLDB '06

LPTA

Linear Programming adaptation of TA

R(X1,X2)

tidd1

sd1

tidd2

sd2

max( fQ )

0 ≤ X1,X2 ≤1

2X1 + 5X2 ≤ sd1

X2 + 2X2 ≤ sd2

unseenmax ≤ topkmin

fV1 = 2X1 + 5X2

fV 2 = X1 + 2X2

Q:

fQ = 3X1 +10X2

V1 V2

d iteration€

tid

Score

tid

Score

Page 10: Answering Top-k Queries Using Views

VLDB '06

LPTA - Example (cont’)

tid11

s11

tid21

tid31

tid41

tid51€

s21

s31

s41

s51

tid12

s12

tid22

tid32

tid42

tid52€

s22

s32

s42

s52

V1 V2

tid11

tid12

tid21

tid22

Top-1 V1

V2

Qstoppingcondition

X1

X2

R(X1, X2)

O = (0,0)

P = (1,0)

R = (1,1)

T = (0,1)

Page 11: Answering Top-k Queries Using Views

VLDB '06

LPTA Algorithm View Selection Problem

Cost Estimation Framework View Selection Algorithms

Experimental Evaluation Conclusions

Outline

Page 12: Answering Top-k Queries Using Views

VLDB '06

View Selection Problem Given a collection of views

and a query Q, determine the most efficient subset to execute Q on.

Conceptual discussion Two dimensions Higher dimensions

V = {V1,K ,Vr}

U ⊆V

Page 13: Answering Top-k Queries Using Views

VLDB '06

View Selection - 2d

A

B

Min top-k tupleQV1

V2

O = (0,0)

T = (0,1)

P = (1,0)

R = (1,1)

X

YA1

B1

M

Page 14: Answering Top-k Queries Using Views

VLDB '06

View Selection - Higher d

Theorem: If is a set of views for an -dimensional dataset and Q a query, the optimal execution of LPTA requires a subset of views such that .

Question: How do we select the optimal subset of views?

V = {V1,K ,Vr}

U ⊆V

U ≤ m€

m

Page 15: Answering Top-k Queries Using Views

VLDB '06

Outline LPTA Algorithm View Selection Problem

Cost Estimation Framework View Selection Algorithms

Experimental Evaluation Conclusions

Page 16: Answering Top-k Queries Using Views

VLDB '06

Cost Estimation Framework What is the cost of running LPTA when a

specific set of views is used to answer a query?

Cost = number of sequential accesses

Cost = 6 sequential accesses

Min top-k tuple

Can we find that costwithout actually running LPTA?

A

B

QV1

V2

Page 17: Answering Top-k Queries Using Views

VLDB '06

Simulation of LPTA on Histograms

1. Use HQ to estimate the score of the k highest tuple (topkmin).

2. Simulate LPTA in a bucket by bucket lock step to estimate the cost.

HQ HV1 HV2

topkmin

HQ: approximates the scoredistribution of the query Q

b bucketsn/b tuples per bucket

Cost

Page 18: Answering Top-k Queries Using Views

VLDB '06

Outline LPTA Algorithm View Selection Problem

Cost Estimation Framework View Selection Algorithms

Experimental Evaluation Conclusions

Page 19: Answering Top-k Queries Using Views

VLDB '06

View Selection Algorithms Exhaustive (E): Check all possible

subsets of size , . Greedy (SV): Keep expanding the set of

views to use until the estimated cost stops reducing.

p ≤ m

pr

( )

Page 20: Answering Top-k Queries Using Views

VLDB '06

Requires the solution of a single linear program.

(0,1)

(1,0)

(0,0)

fV j ≤ s

Q Selected Views

s

s€

s

s

s

max( fQ )

Select Views Spherical (SVS)

T

Page 21: Answering Top-k Queries Using Views

VLDB '06

Select Views By Angle (SVA)Select Views By Angle (SVA): Sort the views by

increasing angle with respect to Q.

(0,1)

(1,0)

(0,0)

QSelected Views

V1

V2V3V4

ϕ1

ϕ 2

ϕ 3

ϕ 4

ϕ1 <ϕ 2 <ϕ 3 <ϕ 4

Page 22: Answering Top-k Queries Using Views

VLDB '06

General Queries and Views Views that materialize their top-k tuples.

Truncate the view histograms. Accommodating range conditions

Select the views that cover the range conditions.

Truncate each attribute’s histogram.

Page 23: Answering Top-k Queries Using Views

VLDB '06

Outline LPTA Algorithm View Selection Problem

Cost Estimation Framework View Selection Algorithms

Experimental Evaluation Conclusions

Page 24: Answering Top-k Queries Using Views

VLDB '06

Experiments Datasets (Uniform, Zipf, Real) Experiments:

Performance comparison of LPTA, PREFER and TA

Accuracy of the cost estimation framework Performance of LPTA using each of the

view selection algorithms Scalability of the LPTA algorithm

Page 25: Answering Top-k Queries Using Views

VLDB '06

Performance comparison of LPTA, PREFER and TA

Uniform dataset, 3dReal dataset, 2d

Page 26: Answering Top-k Queries Using Views

VLDB '06

Cost Estimation Accuracy

(buckets = 0.5% of n) (buckets = 1% of n)2d

Page 27: Answering Top-k Queries Using Views

VLDB '06

Performance of LPTA using View Selection Algorithms

(2d) (3d)500K tuples, top-100

Page 28: Answering Top-k Queries Using Views

VLDB '06

Scalability Experiments on LPTA

(2d, uniform dataset) (500K tuples, top-100)

Page 29: Answering Top-k Queries Using Views

VLDB '06

Conclusions Using views for top-k query answering LPTA: linear programming adaptation of

TA View selection problem, cost estimation

framework, view selection algorithms Experimental evaluation

Page 30: Answering Top-k Queries Using Views

VLDB '06

(Thank You!)

Questions?