answering top-k queries using views gautam das (univ. of texas), dimitrios gunopulos (univ. of...

Post on 20-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Answering Top-k Queries Using Views

Gautam Das (Univ. of Texas),

Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto),

Dimitris Tsirogiannis (Univ. of Toronto)

VLDB '06

Introduction

Preferences expressed as scoring functions on the attributes of a relation, e.g

tid X1 X2 X3

1 82 1 59

2 53 19 83

3 29 99 15

4 80 45 8

5 28 32 39€

fQ

tid Score

2 612

1 543

4 370

3 360

5 343

Top-k: k tuples with the highest score

fQ = 3X1 + 2X2 + 5X3

R

VLDB '06

Related Work

TA [Fagin et. al. ‘96] Deterministic stopping condition Always the correct top-k set

PREFER [Hristidis et. al. ‘01] Stores multiple copies of base relation R Utilizes only one

We complement existing approaches

VLDB '06

Motivation Query answering using views Space-Performance tradeoff Improved efficiency Can we exploit the same tradeoffs for

top-k query answering?

VLDB '06

Problem Statement

fQ = 3X1 + 2X2 + 5X3V1 tid Score

3 553

4 385

5 216

2 201

1 169

fV1 = 2X1 + 5X2

V2 tid Score

2 351

1 237

5 177

3 159

4 88

fV 2 = X2 + 4X3

R tid X1 X2 X3

1 82 1 59

2 53 19 83

3 29 99 15

4 80 45 8

5 28 32 39

Ranking Views: Materialized results of previously asked top-k queriesProblem: Can we answer new ad-hoc top-k queriesefficiently using ranking views?

VLDB '06

Outline LPTA Algorithm View Selection Problem

Cost Estimation Framework View Selection Algorithms

Experimental Evaluation Conclusions

VLDB '06

LPTA - Setting Linear additive scoring functions e.g.

Set of Views: Materialized result of a previously executed

top-k query Arbitrary subset of attributes Sorted access on pairs

Random access on the base table R

tid,scoreQ tid( )( )

fQ = 3X1 + 2X2 + 5X3

VLDB '06

LPTA - Example

tid11

s11

tid21

tid31

tid41

tid51€

s21

s31

s41

s51

tid12

s12

tid22

tid32

tid42

tid52€

s22

s32

s42

s52

V1 V2

tid11

tid12

Top-1V1

V2

Qstoppingcondition

X1

X2

R(X1, X2)

O = (0,0)

P = (1,0)

R = (1,1)

T = (0,1)

VLDB '06

LPTA

Linear Programming adaptation of TA

R(X1,X2)

tidd1

sd1

tidd2

sd2

max( fQ )

0 ≤ X1,X2 ≤1

2X1 + 5X2 ≤ sd1

X2 + 2X2 ≤ sd2

unseenmax ≤ topkmin

fV1 = 2X1 + 5X2

fV 2 = X1 + 2X2

Q:

fQ = 3X1 +10X2

V1 V2

d iteration€

tid

Score

tid

Score

VLDB '06

LPTA - Example (cont’)

tid11

s11

tid21

tid31

tid41

tid51€

s21

s31

s41

s51

tid12

s12

tid22

tid32

tid42

tid52€

s22

s32

s42

s52

V1 V2

tid11

tid12

tid21

tid22

Top-1 V1

V2

Qstoppingcondition

X1

X2

R(X1, X2)

O = (0,0)

P = (1,0)

R = (1,1)

T = (0,1)

VLDB '06

LPTA Algorithm View Selection Problem

Cost Estimation Framework View Selection Algorithms

Experimental Evaluation Conclusions

Outline

VLDB '06

View Selection Problem Given a collection of views

and a query Q, determine the most efficient subset to execute Q on.

Conceptual discussion Two dimensions Higher dimensions

V = {V1,K ,Vr}

U ⊆V

VLDB '06

View Selection - 2d

A

B

Min top-k tupleQV1

V2

O = (0,0)

T = (0,1)

P = (1,0)

R = (1,1)

X

YA1

B1

M

VLDB '06

View Selection - Higher d

Theorem: If is a set of views for an -dimensional dataset and Q a query, the optimal execution of LPTA requires a subset of views such that .

Question: How do we select the optimal subset of views?

V = {V1,K ,Vr}

U ⊆V

U ≤ m€

m

VLDB '06

Outline LPTA Algorithm View Selection Problem

Cost Estimation Framework View Selection Algorithms

Experimental Evaluation Conclusions

VLDB '06

Cost Estimation Framework What is the cost of running LPTA when a

specific set of views is used to answer a query?

Cost = number of sequential accesses

Cost = 6 sequential accesses

Min top-k tuple

Can we find that costwithout actually running LPTA?

A

B

QV1

V2

VLDB '06

Simulation of LPTA on Histograms

1. Use HQ to estimate the score of the k highest tuple (topkmin).

2. Simulate LPTA in a bucket by bucket lock step to estimate the cost.

HQ HV1 HV2

topkmin

HQ: approximates the scoredistribution of the query Q

b bucketsn/b tuples per bucket

Cost

VLDB '06

Outline LPTA Algorithm View Selection Problem

Cost Estimation Framework View Selection Algorithms

Experimental Evaluation Conclusions

VLDB '06

View Selection Algorithms Exhaustive (E): Check all possible

subsets of size , . Greedy (SV): Keep expanding the set of

views to use until the estimated cost stops reducing.

p ≤ m

pr

( )

VLDB '06

Requires the solution of a single linear program.

(0,1)

(1,0)

(0,0)

fV j ≤ s

Q Selected Views

s

s€

s

s

s

max( fQ )

Select Views Spherical (SVS)

T

VLDB '06

Select Views By Angle (SVA)Select Views By Angle (SVA): Sort the views by

increasing angle with respect to Q.

(0,1)

(1,0)

(0,0)

QSelected Views

V1

V2V3V4

ϕ1

ϕ 2

ϕ 3

ϕ 4

ϕ1 <ϕ 2 <ϕ 3 <ϕ 4

VLDB '06

General Queries and Views Views that materialize their top-k tuples.

Truncate the view histograms. Accommodating range conditions

Select the views that cover the range conditions.

Truncate each attribute’s histogram.

VLDB '06

Outline LPTA Algorithm View Selection Problem

Cost Estimation Framework View Selection Algorithms

Experimental Evaluation Conclusions

VLDB '06

Experiments Datasets (Uniform, Zipf, Real) Experiments:

Performance comparison of LPTA, PREFER and TA

Accuracy of the cost estimation framework Performance of LPTA using each of the

view selection algorithms Scalability of the LPTA algorithm

VLDB '06

Performance comparison of LPTA, PREFER and TA

Uniform dataset, 3dReal dataset, 2d

VLDB '06

Cost Estimation Accuracy

(buckets = 0.5% of n) (buckets = 1% of n)2d

VLDB '06

Performance of LPTA using View Selection Algorithms

(2d) (3d)500K tuples, top-100

VLDB '06

Scalability Experiments on LPTA

(2d, uniform dataset) (500K tuples, top-100)

VLDB '06

Conclusions Using views for top-k query answering LPTA: linear programming adaptation of

TA View selection problem, cost estimation

framework, view selection algorithms Experimental evaluation

VLDB '06

(Thank You!)

Questions?

top related