1 continuous k-dominant skyline query processing presented by prasad sriram nilu thakur

24
1 Continuous k-dominant Skyline Query Processing Presented by Prasad Sriram Nilu Thakur

Post on 20-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

1

Continuous k-dominant Skyline Query Processing

Presented by

Prasad Sriram

Nilu Thakur

2

Outline

Introduction Problem definition Key Concepts Validation Rewrite Today

3

Example Skyline

Which one is better? e or b? (e, because its price and distance dominate those

of b) C or f?

Finding skyline of hotel, lesser price & closer to the beach

1 2 3 4

200

150

100

50

Distance

Price a

b

c

d

ef

4

Problem Definition

InputA set of points, p1,p2,…pn

OutputA set of points P (referred to as the skyline points), such that

any pointp1 Є P is not dominated by any other point in the dataset

Objective Provide correct and complete resultsMinimize the query response time and memory consumptionContinuous queries require continuous evaluationScalability in terms of the number of queries

ConstraintsMinimize the number of dominance checks

5

Skyline Properties (1/2) Meaningful for incomparable dimensions

Browsing Laptops Price, weight, size, memory, etc.

Insensitive to scaling and shifting of the dimensions Skyline - Curse of Dimensionality Movie Rating

Different users may have different rating preferences

Movie p better than q only if p rated higher or equal to q by all users

One outlier opinion will invalidate the dominance

6

Skyline Properties (2/2)

Too many skyline points in high dimensional spaces Example: NBA data set, 17000 player season statistics

on 17 attributes Over 1000 skyline points in the full space Some average-skilled players are in the skyline if

they are not bad on some attributes. Possible Solutions

Dimension Reduction Techniques - Requires domain knowledge

Subspace Skylines - Many subspaces need to be explored

Relax the notion of d-dominance - k-dominance

7

k-dominant Skyline k-Dominate

If A is not worse than B on k dimensions, and better on at least one of the k dimensions, we say A k-dominates B.

k-Dominant Skyline k-dominant skyline contains all the points that

cannot be k-dominated by any other point k-Dominant Skyline Query

Given a data set, find the k-dominant skyline When k=d, we have the conventional skyline K-dominance is cyclic unlike d-dominance

Slide Courtesy [2] 8

k-dominant Skyline - Example

d1 d2 d3 d4 d5 d6

p1 2 2 2 4 4 4

p2 4 4 4 2 2 2

p3 3 3 3 5 3 3

p4 4 4 4 3 3 3

p5 5 5 5 1 5 5

conventional skyline

5-dominant skyline

4-dominant skyline

Smaller k, smaller k-dominant skyline

9

Cyclic Properties of k-dominance k-dominance can be cyclic A 3-dominates B

d1 d2 d3 d4

A 5 5 5 5

B 1 6 6 6

C 2 1 7 7

D 3 2 1 8

10

Cyclic Properties of k-dominance

B 3-dominates C

d1 d2 d3 d4

A 5 5 5 5

B 1 6 6 6

C 2 1 7 7

D 3 2 1 8

11

Cyclic Properties of k-dominance

C 3-dominates D

d1 d2 d3 d4

A 5 5 5 5

B 1 6 6 6

C 2 1 7 7

D 3 2 1 8

12

Cyclic Properties of k-dominance

D 3-dominates A

d1 d2 d3 d4

A 5 5 5 5

B 1 6 6 6

C 2 1 7 7

D 3 2 1 8

14

A naïve approach

Case 1 A new point arrives

It is k-dominated by some points It k-dominates some points

Case 2 A point expires

15

An improved approach

a(1)

b(3) c(5)

d(7) e(9) f(11) g(13)

Skyline heap

Non-Skyline heap

16

An improved approach

a(1)

b(3) c(5)

d(7) e(9) f(11) g(13)

Skyline heap

Non-Skyline heap

h(15)

h(26)

a 16 DIS

b 18 DIS

c 20 DIS

d 22 DIS

e 24 DIS

f 26 DIS

g 28 DIS

h 26 RET

17

An improved approach

b(3)

d(7) c(5)

e(9) f(11) g(13)

Skyline heap

Non-Skyline heap

h(26)

b 18 DIS

c 20 DIS

d 22 DIS

e 24 DIS

f 26 DIS

g 28 DIS

h 26 RET

at t = 16

18

An improved approach

b(3)

d(7) c(5)

e(9) f(11) g(13)

Skyline heap

Non-Skyline heap

h(26)

b 18 DIS

c 20 DIS

d 22 DIS

e 24 DIS

f 26 DIS

g 28 DIS

i 20 RET

i(17)

i(20)

19

An improved approach

c(5)

d(7) f(11)

e(9) g(13)

Skyline heap

Non-Skyline heap

i(20)

c 20 DIS

d 22 DIS

e 24 DIS

f 26 DIS

g 28 DIS

i 20 RET

at t = 18

20

An improved approach

c(5)

d(7) f(11)

e(9) g(13)

Skyline heap

Non-Skyline heap

i(20)

c 20 DIS

d 22 DIS

e 24 DIS

f 26 DIS

g 28 DIS

i 20 RET

j(19)

21

An improved approach

c(5)

d(7) f(11)

e(9) g(13)

Skyline heap

Non-Skyline heap

i(20)

c 20 DIS

d 22 DIS

e 24 DIS

f 26 DIS

g 28 DIS

i 20 RET

j 32 RET

j(32)

22

Validations

Methodology Theorem based proving for correctness

and completeness Experiments to analyze performance

Validation criteria Query Response time

23

Experimental Analysis

4500

4550

4600

4650

4700

4750

4800

4850

4900

4950

5000

5050

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

k-dominance checks

Res

po

nse

Tim

e in

mil

isec

s

Improved Approach

Naïve Approach

24

Rewrite today

Improvements A better technique for k-

dominance Conduct detailed experiments with

network object generators Think about how to find (spatial)

skyline in road networks

25

References1. Yufei Tao, Dimitris Papadias: Maintaining Sliding Window Skylines on

Data Streams. IEEE Trans. Knowl. Data Eng. 18(2): 377-391 (2006) 2. Chee Yong Chan, H. V. Jagadish, Kian-Lee Tan, Anthony K. H. Tung,

Zhenjie Zhang: Finding k-dominant skylines in high dimensional space. SIGMOD Conference 2006: 503-514.

3. M. Sharifzadeh, C. Shahabi. The Spatial Skyline Queries. In Proceedings of VLDB’06.

4. Michael D. Morse, Jignesh M. Patel, William I. Grosky: Efficient Continuous Skyline Computation. ICDE 2006: 108.

5. Zhiyong Huang, Hua Lu, Beng Chin Ooi, Anthony K.H. Tung, Continuous Skyline Queries for Moving Objects, IEEE Transactions on Knowledge and Data Engineering, vol. 18,  no. 12,  pp. 1645-1658,  Dec.,  2006.

6. S. Borzsonyi, D. Kossmann, and K. Stocker. The Skyline Operator. In Proceedings of ICDE'01.

7. D. Kossmann, F. Ramsak, and S. Rost. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. In Proceedings of VLDB'02.