information technology influence computation in spatial dabases muhammad aamir cheema faculty of...

Post on 19-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Information Technology

Influence Computation in Spatial Dabases

Muhammad Aamir CheemaFaculty of Information TechnologyMonash University, Australia

aamir.cheema@monash.eduwww.aamircheema.com

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline QueriesOther work

Faculty of Information Technology

Introduction: Influence Set

In a data set consisting of facilities and users, a facility influences a user if considers as one of its most “important” facilities

A set of users influenced by is called influence set of

Influence

Influence Set

U1

U2f2

f1

Influence Set of Coles

Faculty of Information Technology

Introduction: Influence Set

A facility f is important for u if it is one of the top-k facilities for a user u considering her preferences, e.g., Distance Rating Price

Important facility?

Who are my potential customers ?

Faculty of Information Technology

Introduction: Influence Set

Important to identify potential users/customers Used in various applications such as marketing, cluster and

outlier analysis, and decision support systems

Significance

Reverse Nearest Neighbors Reverse Top- Reverse Skyline

Types

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline QueriesOther work

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008]Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Faculty of Information Technology

Reverse k Nearest Neighbors (RkNN)

• Definition of importance– A facility f is important to a user if f is

one of its k closest facilities

• Reverse k Nearest Neighbors– Find every user u for which the query

facility q is one of its k-closest facilities.

Influence set of f1 is {u1,u2}

Influence set of f2 is {u3}

K=1

u2

f1

f2

u1

u3

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Faculty of Information Technology

Pre-computation based approach[F. Korn et al., SIGMOD 2000]

• Pre-computation– For each user u

• Draw a circle centered at u containing its k closest facilities

– Index these circles using an R-tree

• Query processing– Find the circles that contain q

• Problems– arbitrary k?– data updates?

u1

f1

f2u2

u3 f3

u4

k = 1

q q

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Faculty of Information Technology

On-the-fly RkNN Algorithms

Pruning

Verification• Find the users that lie in the

unpruned space• For each such user, check

whether it is a RkNN of q or not

• Prune the search space using near by facilities of q

Data indexed by R-trees

Faculty of Information Technology

On-the-fly RkNN AlgorithmsPruning

Verification

Half-space

Region-based

TPL (VLDB 2004), TPL++ (PVLDB 2015)

FINCH (PVLDB 2008),InfZone (ICDE 2011)

Six-regions (SIGMOD 2000)

SLICE (ICDE 2014)

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Faculty of Information Technology

1. Divide the whole space centred at the query q into six equal regions each of 60o

2. Let f be a facility in a partition P

3. Let u be a user in P for which dist(u,q) > dist(q,f)

4. q cannot be the closest facility of u

Proof Sketch: • fqu ≤ 60o and ufq > 60o

• ufq > fqu uq > uf

f

q

u

Six-regions: Pruning[I. Stanoi et al., SIGMOD Workshop 2000]

Faculty of Information Technology

1. Divide the whole space centred at the query q into six equal regions

2. Find the k-th nearest neighbor in each Partition.

3. The k-th nearest facility of q in each region defines the area that can be pruned

ba

c

d

q

u1

u2

Six-regions: Pruning[I. Stanoi et al., SIGMOD Workshop 2000] k =

2

Faculty of Information Technology

• Access users R-tree and prune the entries that lie in the pruned area

• For each unpruned user u– Issue a boolean range query

to check if u is a RkNN or not

Disadvantage: Requires boolean range query for each candidate user

ba

c

d

q

u1

Six-regions: Verification[I. Stanoi et al., SIGMOD 2000] k =

2

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Faculty of Information Technology

• Half-space Pruning:• q cannot be the closest facility of u if

it lies in the half-space• q cannot be among the k-

closest facilities of u if u lies in k half-spaces

• Pruning Algorithm1. Find the nearest unseen facility f in the

unpruned area.2. Draw a bisector between q and f to

prune the search space3. Go to step 1 unless all facilities in the

unpruned area have been accessed

ba

c

d

q

u

TPL: Pruning[Y. Tao et al., VLDB 2004]

k = 2

Faculty of Information Technology

TPL: Pruning[Y. Tao et al., VLDB 2004]

b

q

Advantage: Prunes more space than six-

regionsDisadvantage:X Pruning is more expensive especially when k is not small

Faculty of Information Technology

TPL: Pruning[Y. Tao et al., VLDB 2004]

Advantage: Prunes more space than six-

regionsDisadvantage:X Pruning is more expensive especially when k is not small

Find the k-half spaces that contain the user

Requires using subsets

aq

d

c b

u

k = 2

{a,b}

{b,c}

{c,d}

{a,c}

k! (m-k)!

m!

Faculty of Information Technology

TPL: Pruning[Y. Tao et al., VLDB 2004]

Solution: TPL does not use all possible subsets

1. Sort facilities by hilbert-values2. Consider only the subset

consisting of k consecutive facilities

Considers m subsetsX Some pruning power is lost

aq

d

c b

u

k = 2

{a,b}

{b,c}

{c,d}

{d,a}

{a,b,c,d}

Faculty of Information Technology

TPL: Verification[Y. Tao et al., VLDB 2004]

• Prune the user R-tree entries using the k-half spaces approach

• Determine the candidate users

• Issue a bulk boolean range query to verify all candidate users

aq

d

c b

u

k = 2

{a,b}

{b,c}

{c,d}

{d,a}

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Faculty of Information Technology

Key Idea Approximate the unpruned area

by a convex polygon

Advantage: Pruning is more efficient (e.g.,

point containment in logarithmic time)

FINCH: Pruning[W. Wu et al., PVLDB 2008]

aq

c b

u

k = 2

Faculty of Information Technology

Computing polygon• Get intersection points of half-spaces

and the boundary space• For each intersection point

– Compute a counter that denotes the number of half-spaces that contain it

– Remove the intersections with counter ≥ k

• Compute the convex hull of remaining intersection points

FINCH: Pruning[W. Wu et al., PVLDB 2008]

aq

c b

u

k = 2

2

1 13

1

1

00

00 0

1

2

Faculty of Information Technology

Pruning Algorithm1. Initialize whole space as the convex

polygon2. Find the nearest facility that lies inside

the convex polygon3. Draw its half-space, compute new

intersections and their counters and update the convex polygon

4. Go to step 2 until there is an un-accessed facility inside the polygon

FINCH: Pruning[W. Wu et al., PVLDB 2008]

aq

c b

u

k = 2

Faculty of Information Technology

• Prune the user R-tree entries that lie outside the convex polygon

• For each user that lies inside the polygon

– Issue a boolean range query to check if it is a RkNN or not

FINCH: Verification[W. Wu et al., PVLDB 2008]

aq

c b

u

k = 2

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Faculty of Information Technology

Influence Zone (InfZone): Motivation[M. Cheema et al., ICDE 2011]

Pruning

Verification

• Find the users that lie in the unpruned space

• For each such user, issue a boolean range query to verify it

• Prune the search space using near by facilities of q

Influence Zone is an area such that a user u is a RkNN if and only if u is inside this area

• Compute influence zone using near by facilities

• Find the users that lie in the influence zone

Faculty of Information Technology

The influence zone corresponds to the unpruned polygon when the bisectors of all the facilities have been considered for pruning.

Challenges:• How to compute unpruned polygon?• Using all facilities for pruning will be

very expensive

db

c

a

q

Influence Zone (InfZone): Challenges[M. Cheema et al., ICDE 2011] k =

2

Faculty of Information Technology

Challenge 1: Constructing the polygon• Like FINCH, compute the counters of

all intersections• Remove the intersections with

counter ≥ k• Keep only the intersections that

either lie on the boundary of the data space OR have counter equal to k-1 or k-2

• Keep only the extreme intersections on each boundary

• Sort the intersections according to their angles with q

• Connect the intersections in the sorted order

Influence Zone (InfZone): Construction[M. Cheema et al., ICDE 2011]

aq

c b

k = 2 2

1 1

3

1

1

00

00 0

2

0

Faculty of Information Technology

Challenge 2: Avoid accessing all facilities• Let Cv denote the circle centered at a

vertex v with radius dist(v,q)• A facility f can be ignored if it lies

outside Cv for every vertex of the current influence zone

• An entry e of the facility R-tree can be ignored if it lies outside Cv for every vertex of the current influence zone

Influence Zone (InfZone): Construction[M. Cheema et al., ICDE 2011]

aq

c b

k = 2

1 11

1

00

00

Faculty of Information Technology

Influence Zone Construction Algorithm• Initialize InfZone as the whole data space• Enheap the root of the R-tree in a heap• While heap is not empty

– De-heap an entry e– If e lies outside every Cv

• Ignore e– Else

• If e is an intermediate node– Insert children of e in the heap

• Else– Draw the bisector of e and

update the current influence zone

Influence Zone (InfZone): Construction[M. Cheema et al., ICDE 2011]

aq

c b

k = 2

1 11

1

00

00

Faculty of Information Technology

• Prune the user R-tree entries that lie outside the influence zone

• Return the users that lie inside the influence zone

Point containment can be done in logarithmic time O(log m)

Rectangle containment takes linear time O(m)

Influence Zone (InfZone): Verification[M. Cheema et al., ICDE 2011]

aq

c b

k = 2

1 11

1

00

00

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Faculty of Information Technology

SLICE: Motivation[S. Yang et al., ICDE 2014]

Regions-based (Six-regions)

Half-space

(InfZone)

VS

Range query

Pruning CostO(m log k) O(km2

)

Pruning Power

Verification Cost

Low High

O(log m)

SLICE

O(m log m)

High

O(k)

m is the # of facilities considered for pruning

Faculty of Information Technology

1. Divide the whole space centred at the query q into t equal regions

2. Draw arcs for each facility

3. k-th arc in each partition defines the pruning region

Pruning requires checking only one distance

q

f1

f2

k=2

SLICE: Key Idea[S. Yang et al., ICDE 2014]

Faculty of Information Technology

SLICE: Comparison with six-regions[S. Yang et al., ICDE 2014]

q

f

Six-region SLICE

Partitions Pruned

No. of Partitions

One

6

Area pruneddist(f,q) 𝑑𝑖𝑠𝑡 ( 𝑓 ,𝑞)2 cos(𝜃max)

< 90o

any

VSθmax

Faculty of Information Technology

SLICE: Verification[S. Yang et al., ICDE 2014]

• Significant facility: – k-th arc in each partition is called

the bounding arc – A facility f that prunes at least one

point p ∈ P lying inside the bounding arc of P.

– An insignifcant facility cannot prune any candidate user

MN

𝐫 𝐁

P

𝐫 𝐁 𝐫 𝐁

Verification for a candidate

Issuing range query

for each candidate

Access significant facilities during

pruning

High I/O and cpu cost

Use significant facilities to verify O(k)

Regions-based

2

SLICE

q

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Faculty of Information Technology

TPL++: Optimization 1[S. Yang et al., PVLDB 2015]TPL:1. Sort facilities by hilbert-values2. Consider only the subset

consisting of k consecutive facilities

X Considers m subsets X Some pruning power is lostTPL++:3. Initialize a counter to 0

4. Access facilities one by one

5. Increment the counter whenever a facility prunes the user u

6. Prune u when counter ≥ k

aq

d

c b

u

k = 2

{a,b}

{b,c}

{c,d}

{d,a}

O(km)

O(m)

Faculty of Information Technology

Pruning power: TPL vs TPL++[S. Yang et al., PVLDB 2015]

Faculty of Information Technology

TPL++: Optimization 2[S. Yang et al., PVLDB 2015]TPL:• A facility entry e or a facility

point that lies in the pruned space is ignored

TPL++:• A facility entry e that lies in the

pruned space is ignored• A facility point is used for

pruning even if it lies in the pruned space

aq

d

c b

u

d

Faculty of Information Technology

TPL vs TPL++

2 5 10 15 20 250

10

20

30

40I/O cost

TPL TPL++

k

2 5 10 15 20 250

60

120

180

240CPU cost (ms)

TPL TPL++

k

2 times better 20 times better

Faculty of Information Technology

Outline: Reverse k Nearest NeighborsIntroductionPre-computation based approachOn-the-fly algorithms

Six-regions [2000]TPL [2004]FINCH [2008] Influence Zone [2011]SLICE [2014]TPL++ [2015]

Comparison of RkNN algorithms

Faculty of Information Technology

Pruning Six-regions

TPL TPL++ FINCH InfZone SLICE

node O(1) O(km) O(m) O(m) O(m) O(1)

point O(1) O(km) O(m) O(logm) O(m) O(1)

Adding f O(log k) O(logm) O(logm) O(m2) O(m2) O(log m)

Verification

node O(1) O(km) O(m) O(m) O(m) O(1)

point O(1) O(km) O(m) O(logm) O(logm)

O(1)

#candidates

Large Large Small Medium Minimal Small

Verifying u Range query

Bulk Range query

Bulk Range query

Range query

O(logm)

O(k)

Comparison of RkNN Algorithms

Faculty of Information Technology

Experimental Comparison [Yang et al., PVLDB 2015]

• Setup– Intel Xeon 2.66 GHz CPU, 4GB

Memory and Hard disk– Index: R*-tree – 100 buffers– I/O cost and CPU cost– Average cost per query

• Data sets– Three real data sets (up to 25M

points)– CA, LA and NA– Synthetic data sets follows

different distributions (up to 20M points)

Source code and data sets are available online

Faculty of Information Technology

Experimental Comparison [Yang et al., PVLDB 2015]

Faculty of Information Technology

50

RankingCriteria 1st 2nd 3rd 4th 5th 6th

I/O (no buffer) TPL++,InfZone

SLICE TPL FINCH SIX

I/O (small buffer)

TPL++,InfZone

FINCH SLICE TPL,SIX

CPU (k<10) SLICE InfZone TPL++ FINCH SIX,TPL

CPU (10<k<25) SLICE InfZone, TPL++

FINCH SIX TPL

CPU (25<k<200)

SLICE TPL++ SIX FINCH InfZone TPL

Implementation

SIX,SLICE TPL, TPL++

FINCH, InfZone

Experimental Comparison [Yang et al., PVLDB 2015]

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k Queries

IntroductionMonochromatic algorithms (2d)Bichromatic algorithms (≥2d)

Reverse Skyline QueriesOther work

Faculty of Information Technology

Reverse Top-k (RTk) QueriesIntroduced by [Vlachou et al., ICDE 2010]

Examples are from [Vlachou et al, ICDE 2010]

Score(p2) = 0.2x3 + 0.8x2 = 2.2

• Definition of importance (Top-k queries)– Each user u has a preference function– Score of a facility is

score(f) = w[1]*f[1] + … w[d]*f[d]– A facility f is important to a user u if f is

one of the top-k facilities for u• Bichromatic Reverse Top-k Query (RTk)

– Find every user u for which the query facility q is one of her top-k facilities.

Tom and Max are the reverse top-1 users of p2

Bob is not a reverse top-1 user of p2

Faculty of Information Technology

Examples are from [Vlachou et al, ICDE 2010]

q = p2, k=1

• Bichromatic RTk queries– Find every user u for which the query facility q is one of her top-k

facilities. (e.g., result is {Tom, Max})• Monochromatic RTk Queries

– Find every weighting vector for which q is one of the top-k facilities.

Result: line segment where w[price]=[1/7,5/6]

Reverse Top-k (RTk) Queries: TypesIntroduced by [Vlachou et al., ICDE 2010]

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k Queries

IntroductionMonochromatic algorithms (2d)Bichromatic algorithms (≥2d)

Reverse Skyline QueriesOther work

Faculty of Information Technology

• Score(q) is the projection on the vector w

• Rank(q) w.r.t. w number of facilities below the red line

• Rank(q) < Rank(f) for every w if q dominates f

• Ignore facilities that are dominated by q• Result is empty if k facilities dominate q

Monochromatic Reverse Top-k Algorithms[Vlachou et al., ICDE 2010]

f

qw=[0.5,0.5]

f

f

Faculty of Information Technology

• The relative rank of q and f depends on the rotation of the red line

Monochromatic Reverse Top-k Algorithms[Vlachou et al., ICDE 2010]

q

f

w

w`w``

Faculty of Information Technology

Algorithm• Start with vertical line• Rank(q) Count the number of facilities

on the left• Rotate the line counter-clockwise• Update Rank(q) when line intersects a

facility • Report the weighting vectors for which

Rank(q) ≤ k

Monochromatic Reverse Top-k Algorithms[Vlachou et al., ICDE 2010]

q

a

b

Rank(q) = 21

Faculty of Information Technology

RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]

• Given a point a=(u,v) and a weighting vector W=(w1, w2), a.score = u*w1 + v*w2

• A point a=(u,v) is mapped to a line a*: y=ux + v in dual

• The weighting vector W=(w1, w2) is mapped to a vertical line W*: x=w1/w2

• The intersection of a* and w* is the point where y= u(w1/w2)+ v = (u*w1 +v*w2))/w2

a

b

a*

W*: x = w1/ w2

ya= a.score/w2

yb= b.score/w2

b*

Primal Dual

Faculty of Information Technology

RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]

• Query: Given a weighted vector W=(w1,w2), return k objects with smallest scores

• Solution: – Map W and all the objects to dual space– Return k lowest lines intersecting W*

a

b

W*: x = w1/ w2

Primal Dual

c d

1

2

Rank1. a2. b3. c4. d

Rank1. d2. b3. a4. c

W*: x = w3/ w4

Faculty of Information Technology

RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]

• Given a set of lines L, mass of a point p is the number of lines that lie strictly below p

• k-lower envelope consists of every point p that lies on one of the lines in L and has mass equal to k-1.

pp’

2-lower envelope

Faculty of Information Technology

RTk using k-lower envelope (2d)[Cheema et al., EDBT 2014]

• Map all facilities to dual space and compute k-lower envelope• Map query point to dual space• Return weighting vectors where query line is below the k-lower envelope

Slide # 61

a

b

Primal Dual

c dW*: x = w1/ w2

q

Faculty of Information Technology

Computing k-lower envelope (2d)[Cheema et al., EDBT 2014]

Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection

Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope

Slide # 62

a

b

Primal Dual

c d

Faculty of Information Technology

Computing k-lower envelope (2d)[Cheema et al., EDBT 2014]

Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection

Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope

a

b

c d

Line with k-th largest slope.

i.e., point in primal with k-th largest x-value

A point (u,v) in primal is

mapped to a line y=ux+v

Faculty of Information Technology

Computing k-lower envelope (2d)[Cheema et al., EDBT 2014]

Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection

Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k Queries

IntroductionMonochromatic algorithms (2d)Bichromatic algorithms (≥2d)

Reverse Skyline QueriesOther work

Faculty of Information Technology

Bichromatic Reverse Top-k (≥2d)[Vlachou et al., ICDE 2010]

Given a set of facilities F and a set of weighting vectors W, return every weighting vector for which q is one of the top-k facilities

Brute Force Algorithm: For each vector w in W

Compute top-k facilities Return w if q is among the top-k facilities

Faculty of Information Technology

Bichromatic Reverse Top-k (≥2d)[Vlachou et al., ICDE 2010]

Threshold based algorithm (RTA)• Sort the weighting vectors by their pair-wise similarity

(Similar vectors have similar top-k results)• Evaluate the first top-k query, calculate a threshold• For each weighting vector

– Try to prune using the threshold– Refine threshold

Faculty of Information Technology

Bichromatic Reverse Top-k (≥2d)[Vlachou et al., ICDE 2010]

• Evaluate top-2 query for w1

• Set threshold based on w2

• score(q) for w2 > threshold discard w2

• Compute top-k for w3 and update the buffer

W=[ w1, w2, w3 ]Buffer: p1, p2

w1 q

p4p1

p2

p3

p5p6

p7

p8p9

p10

w2

w3

Example is from [Vlachou et al, ICDE 2010]

Faculty of Information Technology

Bichromatic Reverse Top-k (≥2d)[Vlachou et al., SIGMOD 2013]

Branch-and-bound algorithm: Key idea• Weighting vectors and facilities are indexed (e.g., by R-tree)• Compute upper and lower bounds• Prune using the bounds• Process unpurned entries

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline Queries

IntroductionPre-computation based approachOn-the-fly algorithm

Other work

Faculty of Information Technology

Reverse Skyline [Dellis et al., VLDB 2007]

Dominance• A point x dominates y if x is at least

as good as y on all the dimensions and x is better than y on at least one dimension

Skyline• Return every point that is not

dominated by any other point

x

y

Distance

Pri

ce

z

c

a

d

Faculty of Information Technology

Reverse Skyline [Dellis et al., VLDB 2007]

Dynamic Dominance• A user u gives her ideal point • A point x dominates y if its difference

from u is not larger than y’s difference on each dimension and is smaller on at least one dimension

Dynamic Skyline• Return every point that is not

dynamically dominated by any other point

Transform each x[i] to |u[i] – x[i]|

x

Distance from airport

Room

siz

e

zy

a

bu

y` a`z`

b`

Faculty of Information Technology

Reverse Skyline[Dellis et al., VLDB 2007]

Definition of Importance• A user u considers a facility f to be

important if f is among the dynamic skyline for the user u

Reverse Skyline• Return every user u for which the query

facility is in its dynamic skyline

x

Distance from airport

Room

siz

e

u

y` a`z`

b`

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline Queries

IntroductionPre-computation based approachOn-the-fly algorithm

Other work

Faculty of Information Technology

Precomputation based approach[Dellis et al., VLDB 2007]

Pre-computation• For each user u

– Compute and store its dynamic skyline

Query processing• u is not an answer if q is dominated by

its pre-computed skyline• u is an answer if q is not dominated by

its pre-computed skyline

x

Distance from airport

Room

siz

e

u

y` a`z`

b`

q

q

Faculty of Information Technology

Precomputation based approach[Dellis et al., VLDB 2007]

Reducing storage requirement• For each user u

– Store only k of its dynamic skyline points

Query processing– u is not an answer if q is dominated by any of

the k stored points– u is guaranteed to be an answer if q

dominates any of the k stored points– otherwise, call verification to check if u is an

answer

x

Distance from airport

Room

siz

e

u

z`

b`

q

qq

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline Queries

IntroductionPre-computation based approachOn-the-fly algorithm

Other work

Faculty of Information Technology

On-the-fly Algorithm[Dellis et al., VLDB 2007]

• Window of a user u is a rectangle centered at u and q on one of the corners

• A user u is an answer iff its window is empty

Key idea• Divide the space around q into 2d

partitions• Compute skyline for each partition• Any user dominated by these skylines

cannot be the answer

e

Distance from airport

Room

siz

e

dc

a

bq

f

g

u`

u

u

Faculty of Information Technology

Outline

IntroductionReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline QueriesOther work

Faculty of Information Technology

Other work on reverse spatial queries

Uncertain data Continuous Monitoring (e.g., moving objects, data

stream) Influence Maximization Other spaces (e.g., road network, general metric

space, non-metric space, obstructed space) Spatial Keyword Queries …

Faculty of Information Technology

Open problems on reverse spatial queries

Location-based reverse top-k queries Location-based reverse skyline queries

Faculty of Information Technology

Location-based Reverse Top-k

• Definition of importance– Each user u has a preference function– A facility f is important to a user u if f is

one of the top-k facilities for u• Reverse Top-k Query (RTk)

– Find every user u for which the query facility q is one of her top-k facilities.

Influence set of f1 is {u2}

Influence set of f2 is {u1,u3}

K=1

u2

f1

f2

u1

u3

Price=1

Price=22

3

0.9*price + 0.1*distance

0.5*price + 0.5*distance

1*distance

Faculty of Information Technology

Location-based Reverse Skyline • Dominance

A facility x dominates another facility y w.r.t. a user u, if for every attribute, u prefers x over y

• Definition of importance A facility f is important to a user u if f is not

dominated by any other facility• Reverse Skyline

Find every user u for which the query facility q is not dominated by any other facility.

Influence set of f1 is {u1,u2}

Influence set of f2 is {u1,u2,u3}

u2

f1

f2

u1

u3

Price=1

Price=2

Faculty of Information Technology

References1. Flip Korn, S. Muthukrishnan: Influence Sets Based on Reverse Nearest Neighbor Queries. SIGMOD 2000:201-212

2. Ioana Stanoi, Divyakant Agrawal, Amr El Abbadi: Reverse Nearest Neighbor Queries for Dynamic Databases. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery 2000:44-53

3. Yufei Tao, Dimitris Papadias, Xiang Lian: Reverse kNN Search in Arbitrary Dimensionality. VLDB 2004:744-755

4. Evangelos Dellis, Bernhard Seeger: Efficient Computation of Reverse Skyline Queries. VLDB 2007:291-302

5. Wei Wu, Fei Yang, Chee Yong Chan, Kian-Lee Tan: FINCH: evaluating reverse k-Nearest-Neighbor queries on location data. PVLDB 1(1):1056-1067 (2008)

6. Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: Reverse top-k queries. ICDE 2010:365-376

7. Muhammad Aamir Cheema, Xuemin Lin, Wenjie Zhang, Ying Zhang: Influence zone: Efficiently processing reverse k nearest neighbors queries. ICDE 2011:577-588

8. Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: Monochromatic and Bichromatic Reverse Top-k Queries. IEEE Trans. Knowl. Data Eng. (TKDE) 23(8):1215-1229 (2011)

9. Muhammad Aamir Cheema, Wenjie Zhang, Xuemin Lin, Ying Zhang: Efficiently processing snapshot and continuous reverse k nearest neighbors queries. VLDB J. (VLDB) 21(5):703-728 (2012)

10. Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis: Branch-and-bound algorithm for reverse top-k queries. SIGMOD 2013:481-492

11. Shiyu Yang, Muhammad Aamir Cheema, Xuemin Lin, Ying Zhang: SLICE: Reviving regions-based pruning for reverse k nearest neighbors queries. ICDE 2014:760-771

12. Muhammad Aamir Cheema, Zhitao Shen, Xuemin Lin, Wenjie Zhang: A Unified Framework for Efficiently Processing Ranking Related Queries. EDBT 2014:427-438

13. Shiyu Yang, Muhammad Aamir Cheema, Xuemin Lin, Wei Wang: Reverse k Nearest Neighbors Query Processing: Experiments and Analysis. PVLDB 8(5):605-616 (2015)

Faculty of Information Technology

Thanks

top related