prefer: a system for the efficient execution of multi-parametric ranked queries vagelis hristidis...

PREFER: A System for the Efficient Execution

of Multi-parametric Ranked Queries

• Vagelis Hristidis

University of California, San Diego

• Nick Koudas

AT&T Research

• Yannis Papakonstantinou

University of California, San Diego

Example

Car ID Mileage Year Price Doors

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

Example

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY

0.01· Mileage + 0.6·Year + 0.03· Price

ExampleCar ID Mileage Year Price Doors

2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY

2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY

Problem: Retrieve WHOLE relation

2 20000 2000 11000 41 10000 1997 20000 23 17000 1998 12000 65 5000 1990 12000 24 15000 1990 8000 26 15000 1990 5000 47 12000 1985 5000 4

1 10000 1997 20000 22 20000 2000 11000 43 17000 1998 12000 64 15000 1990 8000 25 5000 1990 12000 26 15000 1990 5000 47 12000 1985 5000 4

ORDER BY

Problem: Retrieve WHOLE relation

PREFERretrieves

onlypart ofrelation

Applications

Such preference queries are used in Web sites like:

• www.Zagat.com ( restaurants)

• www.personallogic.com (online retailer)

Definitions - Problem statement• A preference query orders the tuples of

a relation according to a function of the attribute values. eg: 0.01· Mileage + 0.6·Year + 0.03· Price

• Goal is to produce top-K answers of a preference query, retrieving the minimum # of tuples

Our Approach

PREFER materializes a number of ranked views of the relation and uses them to efficiently answer to preference queries.

Our Approach

Ranked view 0.08*Price +

0.2*Year

YearRanked view0.075*Price + 0.8*Year

Our Approach

0.2*Year

Preference query:

0.07*Price + 0.35*Year

0.8*Year

•Relation

•Space constraints

•Discretization of ranked views’ vectors.

Which ranked views should we

materialize?

PREFER Architecture

Views Creation

Preprocessing stage

View Selection

Pipelining Algorithm

•Query

•Ranked View id

Mat.Views

Output results

Runtime Process Which ranked view should we use to answer to a

specific preference query?

PREFER Architecture

index of mat.

Preprocessing stage•Relation

materialize?

Views Creation

How to use a preference view to answer to a preference query

View Selection

•Query

•Ranked View id

Mat.Views

Output results

Runtime Process

Which ranked view should we use to answer to a

PREFER Architecture

index of mat.

materialize?

Views Creation

Car ID Mileage Year Price Doors fv

1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

Watermark = 14.26

Car ID ... Doors fq

Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price

Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price

last tuple

Watermark

)()()(, 11, vqqqvv tftfTtfRt

Calculating the Watermark

iiiq tAvqtftAqtf

)()()()()(

)()'()()'( 1

iiiiv tftAvqtf

)()()(, 11, vqqqvv tftfTtfRt

Watermark

How to use a ranked view to answer a preference query (cont’d)

PipelineResults Algorithm

1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

1. Calculate Watermark for t1, which is 14.26

Car ID

1 10000 1997 20000 2 16.82 20000 2000 11000 4 16.43 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

2. Find prefix of view with fv greater than watermark value and sort them by fq

Car ID

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

Car ID

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

3. Output tuples up to t1

Car ID

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

4. Repeat using first unprocessed as t1

Car ID

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

Car ID

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

Car ID

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.44 15000 1990 8000 2 10.25 5000 1990 12000 2 9.86 15000 1990 5000 4 97 12000 1985 5000 4 6.4

Car ID

PipelineResults AlgorithmResult , ordered by 0.01*Mileage+0.6*Year+0.03*Price

2 20000 2000 11000 4 16.41 10000 1997 20000 2 16.83 17000 1998 12000 6 15.45 5000 1990 12000 2 9.84 15000 1990 8000 2 10.26 15000 1990 5000 4 97 12000 1985 5000 4 6.4

Car ID

View Selection

•Query

•Ranked View id

Mat.Views

Output results

PREFER Architecture

index of mat.

materialize?

Views Creation

Runtime Process

Define coverage

Ranked view 0.8*Price + 0.2*Year

Preference query:

V1 covers q1: At most k tuples are retrieved from V1 in order to output first result of q1.

Which ranked view should we use to answer to a specific preference

query?

0.8*Year

YearRanked view 0.75*Price +

0.8*Year

query?

Preference query:

0.8*Year

V1 covers q1

query?

View Selection

•Query

•Ranked View id

Mat.Views

Output results

PREFER Architecture

index of mat.

materialize?

Views Creation

Runtime Process

Which ranked views should we materialize?

ViewSelection Algorithm

while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L

VIEWSfor i = 1 to C do

select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv

Which ranked views should we materialize? (cont’d)

Constraint on # of views

Maximum coverage problem using the minimum # of materialized views is NP-Hard.

Greedy Heuristic is

approximation for maximum coverage.e

Related Work

• Preference Query Framework [AW00]• Top-k queries

– Joins• Fagin [F99,F96,F01], equijoins of ordered data

– Selections [reduce top-k selection to range query]• Histograms to estimate cutoff

[Chaudhuri&Gravano 99]• Probabilistic model

[Donjerkovic&Ramakrishnan 99]• Partitioning [Carey & Kossman 97,98]

Related WorkThe Onion Technique (Sigmod 2000). Main observation: the points of interest lie

on the convex hull of the tuple space.Drawbacks of Onion:• Does not scale• Computing the convex hull is very

computationally intensive• Not efficient if the domain of an attribute

has a small cardinality• Not efficient for more than the top-1 result

Experiments

Measured parameters

• # attributes

• size of relation

• # views

• constraint on max # tuples retrieved

Parameters of Experiments

• synthetic datasets

• 3 to 5 attributes

• 10,000 to 500,000 tuples

• random & correlated data

• discretization of 0.1 or 0.05

Experiments (cont’d)

Execution Times (correlated dataset)

0 100 200 300 400 500 600#results requested

Using our algor Using Oracle

Dual PII CPU, 512MB RAM, 4 attr, 50,000 tuples, 34 Views

Experiments (cont’d)

Varying the dataset size

1 6 11 16 21 26

# number of ranked views to use

10,000 tuples 50,000 tuples 500,000 tuples

4 attr, constraint = 500 tuples, discretization = 0.1

Experiments (cont’d)Varying the number of attributes

1 11 21 31 41 51# ranked views to use

3attr 4attr 5attr

500,000 tuples, constraint = 500 tuples, discretization = 0.05...0.1

Experiments (cont’d)Varying guarantee (maximum number of tuples retrieved to output first result) 50,000 tuples

0 2000 4000 6000 8000constraint (#tuples)

10 views 20 views

4 attr, discretization = 0.1

Experiments (cont’d)Varying guarantee (maximum number of tuples retrieved to output first result) 500,000 tuples

0 10000 20000 30000 40000constraint (#tuples)

10 view s 20 view s

4 attr, discretization = 0.1

Experiments (cont’d)Comparison with the Onion Technique

1 10 100

#of results

onion 1 index 2 indices 4 indices 6 indices

50,000 tuples, 3 attr, discretization = 0.05

More Resources

www.db.ucsd.edu/PREFER

• PREFER demo

• PREFER Application– Construct Materialized Views– Issue preference queries

MS Windows, on top of Oracle DBMS

Conclusions• Methodology to efficiently answer to

top-K linearly weighted queries

• Algorithm that uses a ranked view to answer to a preference query

• Ranked materialized views were used

• Experimental evaluation

prefer: a system for the efficient execution of multi-parametric ranked queries vagelis hristidis...

Documents

holistic twig joins: optimal xml pattern matching written...

authority-based keyword search in...

improved search for socially annotated data authors: nikos...

enabling the transition to the mobile web with websieve...

unsupervised ontology- and sentiment-aware review...

information discovery on vertical domains vagelis hristidis...

vagelis sofia book

vagelis...

authority-based keyword search in...

1 nnh: improving performance of nearest- neighbor searches...

nilesh bansal and nick koudas webdb 2007 searching the...

a d - hoc top - k query answering for data streams gautam...

improving retrieval performance by relevance feedback...

relaxing join selection & queries nick koudas et al

holistic twig joins optimal xml pattern matching nicolas...

fast indexes and algorithms for set similarity selection...

probabilistic information retrieval approach for...

templated search over relational databases date: 2015/01/15...

vagelis harmandaris international conference on applied...

ariel cary, zhengguo sun, vagelis hristidis, naphtali rishe...