prefer: a system for the efficient execution of multi-parametric ranked queries vagelis hristidis...

Post on 14-Dec-2015

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PREFER: A System for the Efficient Execution

of Multi-parametric Ranked Queries

• Vagelis Hristidis

University of California, San Diego

• Nick Koudas

AT&T Research

• Yannis Papakonstantinou

University of California, San Diego

Example

Car ID Mileage Year Price Doors

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

Example

Car ID Mileage Year Price Doors

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY

0.01· Mileage + 0.6·Year + 0.03· Price

ExampleCar ID Mileage Year Price Doors

2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4

Car ID Mileage Year Price Doors

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY

0.01· Mileage + 0.6·Year + 0.03· Price

ExampleCar ID Mileage Year Price Doors

2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4

Car ID Mileage Year Price Doors

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY

0.01· Mileage + 0.6·Year + 0.03· Price

Problem: Retrieve WHOLE relation

ExampleCar ID Mileage Year Price Doors

2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4

Car ID Mileage Year Price Doors

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY

0.01· Mileage + 0.6·Year + 0.03· Price

Problem: Retrieve WHOLE relation

PREFERretrieves

onlypart ofrelation

Applications

Such preference queries are used in Web sites like:

• www.Zagat.com ( restaurants)

• www.personallogic.com (online retailer)

Definitions - Problem statement• A preference query orders the tuples of

a relation according to a function of the attribute values. eg: 0.01· Mileage + 0.6·Year + 0.03· Price

• Goal is to produce top-K answers of a preference query, retrieving the minimum # of tuples

Our Approach

PREFER materializes a number of ranked views of the relation and uses them to efficiently answer to preference queries.

Our Approach

Ranked view 0.08*Price +

0.2*Year

0.08

0.2

Price

YearRanked view0.075*Price + 0.8*Year

Our Approach

Ranked view 0.08*Price +

0.2*Year

0.08

0.2

Price

Year

Preference query:

0.07*Price + 0.35*Year

0.07

0.35

Ranked view 0.075*Price +

0.8*Year

•Relation

•Space constraints

•Discretization of ranked views’ vectors.

Which ranked views should we

materialize?

PREFER Architecture

Views Creation

Preprocessing stage

View Selection

Query

Pipelining Algorithm

•Query

•Ranked View id

Mat.Views

Output results

Runtime Process Which ranked view should we use to answer to a

specific preference query?

PREFER Architecture

index of mat.

views

Preprocessing stage•Relation

•Space constraints

•Discretization of ranked views’ vectors.

Which ranked views should we

materialize?

Views Creation

How to use a preference view to answer to a preference query

View Selection

Query

Pipelining Algorithm

•Query

•Ranked View id

Mat.Views

Output results

Runtime Process

How to use a preference view to answer to a preference query

Which ranked view should we use to answer to a

specific preference query?

PREFER Architecture

index of mat.

views

Preprocessing stage•Relation

•Space constraints

•Discretization of ranked views’ vectors.

Which ranked views should we

materialize?

Views Creation

Car ID Mileage Year Price Doors fv

1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

t1

Watermark = 14.26

Car ID ... Doors fq

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price

last tuple

Watermark

)()()(, 11, vqqqvv tftfTtfRt

Calculating the Watermark

k

iiiiv

k

iiiq tAvqtftAqtf

11

)()()()()(

)()'()()'( 1

1vq

k

iiiiv tftAvqtf

)()()(, 11, vqqqvv tftfTtfRt

Watermark

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

How to use a ranked view to answer a preference query (cont’d)

PipelineResults Algorithm

Car ID Mileage Year Price Doors fv

1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price

t1

1. Calculate Watermark for t1, which is 14.26

Car ID

How to use a ranked view to answer a preference query (cont’d)

PipelineResults Algorithm

Car ID Mileage Year Price Doors fv

1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

t1

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price

1. Calculate Watermark for t1, which is 14.26

2. Find prefix of view with fv greater than watermark value and sort them by fq

Car ID

How to use a ranked view to answer a preference query (cont’d)

PipelineResults Algorithm

t1

Car ID Mileage Year Price Doors fv

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price

1. Calculate Watermark for t1, which is 14.26

2. Find prefix of view with fv greater than watermark value and sort them by fq

Car ID

How to use a ranked view to answer a preference query (cont’d)

PipelineResults Algorithm

t1

Car ID Mileage Year Price Doors fv

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

1. Calculate Watermark for t1, which is 14.26

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

Car ID

2

1

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price

How to use a ranked view to answer a preference query (cont’d)

PipelineResults Algorithm

t1

Car ID Mileage Year Price Doors fv

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

1. Calculate Watermark for t1, which is 14.26

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

4. Repeat using first unprocessed as t1

Car ID

2

1

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price

How to use a ranked view to answer a preference query (cont’d)

PipelineResults Algorithm

t1

Car ID Mileage Year Price Doors fv

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price

1. Calculate Watermark for t1, which is 13.1

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

4. Repeat using first unprocessed as t1

Car ID

2

1

How to use a ranked view to answer a preference query (cont’d)

PipelineResults Algorithm

t1

Car ID Mileage Year Price Doors fv

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price

1. Calculate Watermark for t1, which is 13.1

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

4. Repeat using first unprocessed as t1

Car ID

2

1

3

How to use a ranked view to answer a preference query (cont’d)

PipelineResults Algorithm

t1

Car ID Mileage Year Price Doors fv

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

1. Calculate Watermark for t1, which is 8.3

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

4. Repeat using first unprocessed as t1

Car ID

2

1

3

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

How to use a ranked view to answer a preference query (cont’d)

PipelineResults AlgorithmResult , ordered by 0.01*Mileage+0.6*Year+0.03*Price

t1

Car ID Mileage Year Price Doors fv

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.45 5000 1990 12000 2 9.84 15000 1990 8000 2 10.26 15000 1990 5000 4 97 12000 1985 5000 4 6.4

1. Calculate Watermark for t1, which is 8.3

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

4. Repeat using first unprocessed as t1

Car ID

2

1

3

5

4

View Selection

Query

Pipelining Algorithm

•Query

•Ranked View id

Mat.Views

Output results

How to use a preference view to answer to a preference query

Which ranked view should we use to answer to a

specific preference query?

PREFER Architecture

index of mat.

views

Preprocessing stage•Relation

•Space constraints

•Discretization of ranked views’ vectors.

Which ranked views should we

materialize?

Views Creation

Runtime Process

Define coverage

0.8

0.2

Year

Price

Ranked view 0.8*Price + 0.2*Year

V1

q1

Preference query:

0.7*Price + 0.35*Year

0.7

0.35

V1 covers q1: At most k tuples are retrieved from V1 in order to output first result of q1.

Which ranked view should we use to answer to a specific preference

query?

Ranked view 0.8*Price + 0.2*Year

0.8

0.2

Price

Year

Ranked view 0.75*Price +

0.8*Year

Ranked view 0.8*Price + 0.2*Year

0.8

0.2

Price

YearRanked view 0.75*Price +

0.8*Year

Which ranked view should we use to answer to a specific preference

query?

Ranked view 0.8*Price + 0.2*Year

0.8

0.2

Price

Year

Preference query:

0.7*Price + 0.35*Year

0.7

0.35

Ranked view 0.75*Price +

0.8*Year

V1 covers q1

Which ranked view should we use to answer to a specific preference

query?

V1

q1

View Selection

Query

Pipelining Algorithm

•Query

•Ranked View id

Mat.Views

Output results

How to use a preference view to answer to a preference query

Which ranked view should we use to answer to a

specific preference query?

PREFER Architecture

index of mat.

views

Preprocessing stage•Relation

•Space constraints

•Discretization of ranked views’ vectors.

Which ranked views should we

materialize?

Views Creation

Runtime Process

Which ranked views should we materialize?

ViewSelection Algorithm

while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L

VIEWSfor i = 1 to C do

select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv

Which ranked views should we materialize? (cont’d)

ViewSelection Algorithm

while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L

VIEWSfor i = 1 to C do

select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv

Which ranked views should we materialize? (cont’d)

ViewSelection Algorithm

while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L

VIEWSfor i = 1 to C do

select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv

C = 3

Constraint on # of views

Maximum coverage problem using the minimum # of materialized views is NP-Hard.

Greedy Heuristic is

approximation for maximum coverage.e

11

Related Work

• Preference Query Framework [AW00]• Top-k queries

– Joins• Fagin [F99,F96,F01], equijoins of ordered data

– Selections [reduce top-k selection to range query]• Histograms to estimate cutoff

[Chaudhuri&Gravano 99]• Probabilistic model

[Donjerkovic&Ramakrishnan 99]• Partitioning [Carey & Kossman 97,98]

Related WorkThe Onion Technique (Sigmod 2000). Main observation: the points of interest lie

on the convex hull of the tuple space.Drawbacks of Onion:• Does not scale• Computing the convex hull is very

computationally intensive• Not efficient if the domain of an attribute

has a small cardinality• Not efficient for more than the top-1 result

Experiments

Measured parameters

• # attributes

• size of relation

• # views

• constraint on max # tuples retrieved

Parameters of Experiments

• synthetic datasets

• 3 to 5 attributes

• 10,000 to 500,000 tuples

• random & correlated data

• discretization of 0.1 or 0.05

Experiments (cont’d)

Execution Times (correlated dataset)

0

5

10

15

20

25

30

35

40

45

0 100 200 300 400 500 600#results requested

Tim

e (s

ecs)

Using our algor Using Oracle

Dual PII CPU, 512MB RAM, 4 attr, 50,000 tuples, 34 Views

Experiments (cont’d)

Varying the dataset size

0

10

20

30

40

50

60

70

80

90

100

1 6 11 16 21 26

# number of ranked views to use

% q

uer

ies

cove

red

10,000 tuples 50,000 tuples 500,000 tuples

4 attr, constraint = 500 tuples, discretization = 0.1

Experiments (cont’d)Varying the number of attributes

0

10

20

30

40

50

60

70

80

90

100

1 11 21 31 41 51# ranked views to use

% q

uer

ies

cove

red

3attr 4attr 5attr

500,000 tuples, constraint = 500 tuples, discretization = 0.05...0.1

Experiments (cont’d)Varying guarantee (maximum number of tuples retrieved to output first result) 50,000 tuples

0

10

20

30

40

5060

70

80

90

100

0 2000 4000 6000 8000constraint (#tuples)

% q

uer

ies

cove

red

10 views 20 views

4 attr, discretization = 0.1

Experiments (cont’d)Varying guarantee (maximum number of tuples retrieved to output first result) 500,000 tuples

0

10

20

30

40

50

60

70

80

90

100

0 10000 20000 30000 40000constraint (#tuples)

%q

ueri

es c

overe

d

10 view s 20 view s

4 attr, discretization = 0.1

Experiments (cont’d)Comparison with the Onion Technique

0

10000

20000

30000

40000

50000

60000

1 10 100

#of results

# t

up

les

re

trie

ve

d f

rom

d

ata

ba

se

onion 1 index 2 indices 4 indices 6 indices

50,000 tuples, 3 attr, discretization = 0.05

More Resources

www.db.ucsd.edu/PREFER

• PREFER demo

• PREFER Application– Construct Materialized Views– Issue preference queries

MS Windows, on top of Oracle DBMS

Conclusions• Methodology to efficiently answer to

top-K linearly weighted queries

• Algorithm that uses a ranked view to answer to a preference query

• Ranked materialized views were used

• Experimental evaluation

top related