bin jiang, jian pei. problem definition an on-the-fly method ◦ interval skyline query answering...

32
Bin Jiang, Jian Pei

Post on 15-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Bin Jiang, Jian Pei

Page 2: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Outline

Problem Definition An On-the-fly Method

◦ Interval Skyline Query Answering Algorithm◦ Online Interval Skyline Query Algorithm

Radix Priority Search Tree A View-Materialization Method

◦ Non-redundant skyline time series---NRSky[i:j] Experiments

Page 3: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Problem Definition Notions

◦ Time Series: A time series s consists of a set of ( value, timestamp) pairs.Here we denote the value of s at timestamp I by s[i], and s as a sequence of values s[1],s[2],…

◦ Time Interval: a range in time, denoted as [i : j]. We write if ; if .

Some Notions in This Paper

Page 4: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Interval Skyline◦ Given a set S of time series and interval[i:j], the interval skyline

is the set of time series that are not dominated by any other time series in [i:j], denoted by

Problem Definition

Suppose S={S1, S2, S3}

S1 and S2 are in Sky[16:22], while S3 is doninated by S2.

S2

S1

S3

Page 5: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Problem Definition

Interval SkylineProperty 1:If there exist timestamps k1,…,kl(i≤k1<…<kl≤j) such

that and s is the only such a time series,

then

time series is in .

Page 6: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Problem Definition Problem Definition

◦ Given a set of time series S such that each time series is in the base interval ,we want to maintain a data structure D such that any interval skyline queries in interval can be answered efficiently using D.

Methods◦ An On-The-Fly Method

Original Interval Skyline Query Algorithm Online Interval Skyline Query Algorithm

◦ A View-Materialization Method

Page 7: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Outline

Problem Definition An On-the-fly Method

◦ Interval Skyline Query Answering Algorithm◦ Online Interval Skyline Query Algorithm

Radix Priority Search Tree A View-Materialization Method

◦ Non-redundant skyline time series---NRSky[i:j] Experiments

Page 8: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Idea Using the maximum value and minimum value of the time

series, we can determine the domination of some time series without checking the details.

Page 9: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Algorithm1. Set current Skyline Set Sky is null;2. Sort the time series in a list L in the descending order of

their maximum value;3. Set the maximum value of the minimum value of the time

series in Sky 4. For each time series s that satisfies

in L, determine whether it can dominate or be dominated by time series in Sky; If it can not be dominated:

5. add it into Sky ;6. delete its dominance in Sky ;7. update ; 8. Return Sky;

Page 10: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Example

Goal: compute the skyline in interval [2:3]

Steps:1. s2->Sky, maxmin =12. s3->Sky, maxmin =23. s5->Sky, maxmin =44. s5->s1, s1 is discarded, maxmin =45. s4.min=3<4=maxmin, s4 is discarded.

Return Sky={s2,s3,s5}

Page 11: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Disadvantage Checking the max value for each time series and the

min[i:j] for the query interval [i:j] is costly.

Improvement Idea• Utilize Radix Priority Search Tree to maintain the min[i:j]

• Use a sketch to keep the max value for each time series

Page 12: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Radix Priority Search Tree Radix Priority Search Tree is a two-dimensional data

structure, a hybrid of a heap on one dimension and a binary search tree on the other dimension.

Advantages:•Insertion in O(h)•Deletion in O(h)•Query in O(h)

h: the height of the tree

Page 13: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Radix Priority Search Tree◦ Build

• Use the timestamps as the binary tree dimension X and the data value as the heap dimension Y;

• Map W into a fixed domain of X, {0,1,...,w-1};• The height of the tree is O(logw)

◦ Update → One insertion s[ ] One deletion s[ ]

: the most recent timestamp

Page 14: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Sketches◦ A pair (v,t) is maintained if no other pair (v1,t1)

such that v1>v, t1>t;◦ These pairs form the skyline of points in the

interval;◦ The expected number of points in the skyline is

O(logw);◦ With the sketches, finding the maximum value in

W costs O(1) time ;W=[1,3]Sketches : (4,1),(3,2),(2,3)

W=[1,4]Sketches : (5,4)

Page 15: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Complexity◦ Space

Radix priority search tree O(w) Sketch of the max values O(logw)Total: O(nw)

◦ Time Radix priority search tree O(logw) Sketch of the max values O(logw)Total: O(nlogw)

Page 16: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Outline

Problem Definition An On-the-fly Method

◦ Interval Skyline Query Answering Algorithm◦ Online Interval Skyline Query Algorithm

Radix Priority Search Tree A View-Materialization Method

◦ Non-redundant skyline time series---NRSky[i:j] Experiments

Page 17: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Non-redundant interval skylinesA time series s is called a non-redundant skyline time

series in interval [i:j] if1)S is in the skyline in interval[i:j]2)S is not in the skyline in any subinterval[i׳:j׳] [i:j]

It can be proved by pigeonhole principle, if there are more than w skyline intervals, at least two of them will share the same starting timestamps, then one of them is not a minimum skyline interval.

Page 18: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm
Page 19: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Idea Suppose all non-redundant interval skylines are

materialized, we can union all these skylines over all intervals in [i:j] and remove those fail Lemma 2.

Algorithm

Page 20: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

ExampleW= [2:4]Goal: compute the interval skyline in [3:4]

Steps:1. s3->Sky2. s4->Sky3. s1->Sky(s2 is dominated by s1)

Return Sky={s1,s3,s4}

How to maintain the non-redundant skylines ?

Page 21: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Steps

Page 22: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Step1◦ Use the on-the-fly algorithm to obtain the interval

skyline in the new interval W׳.

◦ Find possible false negatives .

Page 23: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Step2-Shared Divide-and-Conquer Algorithm◦ This algorithm is an extension of the divide-and

conquer algorithm(DC).◦ In SDC, a space is defined as a time interval. Each

timestamp represents a dimension.◦ The related spaces(intervals) are organized as a

path, eg. [j:j],[j-1,j],...,[i,j](i<j).

Page 24: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

P3

A

B

P5

P4

P1

P2

P3

A

B

P5

P4

P1

P2

mA

S1 S2

P3

A

B

P5

P1

P2

mA

mB

S12 S22

S11 S21

Divide Step Merge Step

Page 25: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Comparisons

Results

Page 26: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Step3-Remove “redundant time series”

Page 27: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Outline

Problem Definition An On-the-fly Method

◦ Interval Skyline Query Answering Algorithm◦ Online Interval Skyline Query Algorithm

Radix Priority Search Tree A View-Materialization Method

◦ Non-redundant skyline time series---NRSky[i:j] Experiments

Page 28: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Parameters

Page 29: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Synthetic Data Sets◦ Data Sets Properties

◦ Query Efficiency

Page 30: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Synthetic Data Sets◦ Update Efficiency

◦ Space Cost

Page 31: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm

Stock Data Sets◦ Query Time

Page 32: Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm