exploiting page-level upper bound (plub) for multi-type nearest neighbor (mtnn) queries xiaobin ma...

38
1 University of Minnesota Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

Upload: shada

Post on 25-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005 . Outline. Motivation Problem statement Related work and our contributions Proposed algorithm and cost model Experiment design and results - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

1 University of Minnesota

Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor

(MTNN) Queries

Xiaobin Ma

Advisor: Shashi Shekhar

Dec, 2005

Page 2: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

2 University of Minnesota

Outline Motivation Problem statement Related work and our contributions Proposed algorithm and cost model Experiment design and results Conclusion and future work

Page 3: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

3 University of Minnesota

Motivation GIS applications

Find shortest path Through one point from each of different feature types

Page 4: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

4 University of Minnesota

A Running Example

Three feature types:

red(g), green(g), black(b)

q is query point

Route with solid red line is shortest route

Routes with dashed lines are other possible routes

q

Page 5: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

5 University of Minnesota

Basic Concepts

<P1,P2,…,Pk> ordered point sequence and P1,P2,…,Pk are from k

different (feature) types of data sets R(q, P1,P2,…,Pk)

a route from q through points P1,P2,…, and Pk d(R(q, P1,P2,…,Pk))

distance of route R(q, P1,P2,…,Pk) Multi-Type Nearest Neighbor (MTNN)

ordered point sequence <P1’,P2

’,…,Pk’> such that

d(R(q,P1’,P2

’,…,Pk’)) is minimum among all possible

routes d(R(q, P1

’,P2’,…,Pk

’)) is MTNN distance MTNN query

A query finding MTNN

Page 6: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

6 University of Minnesota

Problem Statement for MTNN Query Given:

A query point Distance metric k different (feature) types of spatial objects with data

points numbers N1, N2, N3, … ,Nk respectively R-tree for each data set

Find: Multi-type nearest neighbor (MTNN) Objective: Minimize length of route from query

point covering an instance of each feature Constraint:

Correctness: The tour should be the shortest path for the query point and the given collection of spatial query feature types

Completeness: Only the shortest path is returned as the query result

Page 7: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

7 University of Minnesota

Related Work Optimal sequence route (OSR) query [Kolahdozan

et. al. Tech 05-840 USC] Optimal algorithms (RLORD)

Focus on optimal algorithms for specified permutation of feature types

Point-based algorithms Trip plan query (TPQ) [Li et. al. SSTD 05]

Heuristic algorithms Give approximate results

Page 8: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

8 University of Minnesota

RLORD Example q is query

point Search order

is <r, b, g> R(q,r2,b2,

g2) is greedy route

Radius of circle is d(R(q,r2,b2,g2))

qb2

b15b12b1

g2

g10g12

g13

g1

g6

g8

g11

g3

g9g14

g1g5

b6 b13

b17

b10b5

b8b9

b3

b14

b4b11

g16

g7g4

r2

r9

r10r11

r14r13r7

r4

r5

r6

r3

r12

r1

r8r15

Page 9: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

9 University of Minnesota

RLORD Running Iterations Use backward search strategy O=<g,b,r> First iteration - examine feature type g

<g2>, <g3>, <g4>, <g5>,<g7>,<g9>, <g10>, <g12>, <g13>, <g14>, <g15>, <g16> in a set R

Second iteration - examine next feature type in O For every point bi in black set,

iterate on every partial route <gj>in R: IF d(R(q, bi)) + d(R(bi,gj)) < d(R(q,r2,b2,g2)) THEN put <bi,gj> into a set R1

keep ordered sequence <bi,gj> in R1 such that d(R(bi,gj)) + d(R(gj)) is minimum

<b1,g13>, <b2,g2>, <b3,g3>, <b4,g3>, <b6,g14>, <b7,g14>, <b11,g3>, <b12,g13>, <b13,g14>, <b14,g3>, <b15,g13> in a set R2

R <- R2 Examine next feature type and repeat above procedure until

all types of data are examined

Page 10: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

10 University of Minnesota

Our Contributions

Formalized a new nearest neighbor search problem – Multi-Type Nearest Neighbor (MTNN) query problem

Proposed a new algorithm, i.e., Page Level Upper Bound (PLUB) based algorithm

Evaluated the proposed algorithm via cost model and experiment

Page 11: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

11 University of Minnesota

Key Ideas of PLUB Prune search space at page level Create candidate leaf page sequences Search candidate MTNN in these candidate leaf

page sequences

Page 12: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

12 University of Minnesota

Page Level Upper Bound (PLUB) Algorithm Step 1: First upper bound search

Use basic R-tree based nearest neighbor search algorithm to find an initial upper bound as current upper bound, using greedy strategy

Step 2: R-Tree search Prune search space with current upper bound and form a

set of leaf node candidate sequences, using page level pruning approach

Step 3: Subset search Search candidate MTNN in leaf node candidate sequences Go to step 2 until going thought all permutation of feature

types, using candidate MTNN distance as current upper bound

Page 13: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

13 University of Minnesota

B1

G1

R2

R1

B2

B4

RLUB – An Example

qb2

b15b12b1

g2

g10g12

g13

g1

g6

g8

g11

g3

g9g14

g1g5

b6 b13

b17

b10b5

b8b9

b3

b14

b4b11

g16

g7g4

r2

r9

r10r11

r14

r8r15

r13r7

r4

r5

r6

r3

r12r1

Inputs q: query point Euclidean distance R-tree for each feature

B3

G2

G3

G4

R3

R4

R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees

Page 14: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

14 University of Minnesota

B1

G1

R2

R1

B2

B4

RLUB – An Example

qb2

b15b12b1

g2

g10g12

g13

g1

g6

g8

g11

g3

g9g14

g1g5

b6 b13

b17

b10b5

b8b9

b3

b14

b4b11

g16

g7g4

r2

r9

r10r11

r14

r8r15

r13r7

r4

r5

r6

r3

r12r1

B3

G2

G3

G4

R3

R4

R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees

UB

E?

R1 B1 G1 2.04 NR1 B1 G3 6.2 YR1 B1 G4 4.27 YR1 B3 G1 7.53 YR1 B3 G3 6.54 YR1 B3 G4 4.29 YR1 B4 G1 4.02 YR2 B1 3.7 YR2 B3 G4 3.43 YR2 B4 5.17 YR4 B1 4.08 YR4 B3 7.94 YR4 B4 7.56 Y

Leaf page upper bound calculation (current search bound 3.37)

Only leaf node sequence <R1,B1,G1> left

Page 15: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

15 University of Minnesota

B1

G1

R2

R1

B2

B4

RLUB – An Example

qb2

b15b12b1

g2

g10g12

g13

g1

g6

g8

g11

g3

g9g14

g1g5

b6 b13

b17

b10b5

b8b9

b3

b14

b4b11

g16

g7g4

r2

r9

r10r11

r14

r8r15

r13r7

r4

r5

r6

r3

r12r1

B3

G2

G3

G4

R3

R4

R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees

Search candidate MTNN in <R1,B1,G1>(time unit p-p)

1st iteration <g2><g10><g12>

<g13> Time 4

2nd iteration <b12,g13,><b1,g13>

<b2,g2><b15,g13> Time 4x4+4=20

3rd iteration <r10,b15,g13,><r9,b15,g13

><r2,b2,g2> <r11,b1,g13>

Time 4x4+4=20 Output

Shortest distance route R(q,r11,b1,g13) and distance value 3.16

Page 16: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

16 University of Minnesota

Running Results of RLORD First iteration (time unit p-p)

<g2>, <g3>, <g4>, <g5>,<g7>,<g9>, <g10>, <g12>, <g13>, <g14>, <g15>, <g16>

Time 11 Second iteration

<b1,g13>, <b2,g2>, <b3,g3>, <b4,g3>, <b6,g14>, <b7,g14>, <b11,g3>, <b12,g13>, <b13,g14>, <b14,g3>, <b15,g13>

Time 11x12+12=144 Third iteration

<r1,b11,g3>, <r2,b2,g2>, <r3,b11,g3>, <r8,b1,g13>, <r9,b15,g13>, <r10,b15,g13>, <r11,b1,g13>, <r12,b11,g3>, <r13,b1,g13>, <r14,b1,g13>, <r15,b1,g13>

Time 12x11+11=143 R(q,r11,b1,g13) is shortest among all routes

Shortest distance value 3.16

Page 17: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

17 University of Minnesota

Running Time Comparison Table R-R: rectangle to rectangle distance P-P: point to point distance

R-R P-P

PLUB 17 44RLORD 0 298

RLORD has no R-R distance calculation, but has much more P-P calculation

Cost of R-R < 2 x cost of P-P

Page 18: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

18 University of Minnesota

Cost Model for PLUB (For One Permutation)

CR-T + CLF + CPN CR-T : cost of R-tree traversal to find all R-tree leaf

nodes intersected by the circle with radius of current upper bound, centered at query point q

CLF : cost of page level leaf node search for R-tree candidate leaf node sequences

CPN : cost of point level search for candidate MTNN in candidate leaf node sequences

Page 19: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

19 University of Minnesota

CR-T Model of PLUB

CR-T : R-tree traversal cost CPR :cost of point to rectangle distance calculation N t,i : number of all the tree nodes visited in feature

type i tree traversal CR-T = CPR x Σ N t,i (i= 1, …, k)

Page 20: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

20 University of Minnesota

CLF Model of PLUB

CLF: search of R-tree candidate leaf node sequences

NR-R : Number of leaf nodes visited in candidate leaf node sequences search

CR-R : cost of rectangle to rectangle distance calculation

CLF = NR-R x CR-R

Page 21: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

21 University of Minnesota

CPN Model of PLUB

CPN : search MTNN in candidate leaf node sequences FLS : leaf node candidate sequence filtering ability ratio nl : average point number in leaf node for all feature types pi : page number of feature type i CP-P :cost of point to point distance calculation Cls : cost of search MTNN in single leaf node sequence

Cls = CP-P x (nl +(nl x nl) + nl + (nl x nl) + … + nl + (nl x nl) (k-1 items)

= (k-1) (nl x (nl +1)) x CPP CPN = Cls x Π pi x (1- FLS) i = 1,…,k

Page 22: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

22 University of Minnesota

Cost Model for R-Lord (For One Permutation)

CR-T‘+ CPS CR-T‘: cost of R-tree based coarse pruning, i.e. find all

data points inside initial upper bound CR-T‘ = CR-T + CP-P x nl x (p1+ p2 +p3 +…+ pk-1+ pk ) CPS : cost of candidate MTNN search in remaining

subsets CP-P :cost of point to point distance calculation CPS = CP-P x nl x (p1 + nl x p1xp2 + (p2+ nl x p2xp3 )+ …

+ (pk-1+ nl x pk-1 x pk )

Page 23: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

23 University of Minnesota

Cost Model Summary of PLUB and RLORD( one permutation)

In random or approximate random datasets, FLS is not big enough, PLUB takes more time.

In clustered datasets, FLS tends to be very big. When 1-FLS <(nl x (p1 + nl x p1xp2 +(p2+ nl x p2xp3 )+…+ (pk-1+ nl x pk-1 x pk ))) /((k-1) nl x

(nl +1) x Π pi )PLUB runs faster than RLORD For clustered datasets, it becomes true when clusters becomes more

compact Left side: remaining ratio (r-ratio) Right side: comparison ratio (c-ratio)

General Form Approximate FormPLUB CR-T + CLF + CPN CP-P x (k-1) nl x (nl +1) x Π pi x (1- FLS)

RLORD CR-T‘+ CPS

CP-P x nl x (p1 + nl x p1xp2 + (p2+ nl x p2xp3 ) + … + (pk-1+ nl x pk-1 x pk )

Page 24: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

24 University of Minnesota

Experiment Design

Page 25: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

25 University of Minnesota

Synthetic Data Sets Generation Randomly generate cluster center in rectangle with bottom-

left (0,0) and top-right point (10000,10000) Constraint: the minimum distance between two cluster centers is

minCCDist Around every cluster center, generate cluster member points

Maximum distance from member point to cluster center is ClusterSize

Simplified maximum cluster center distance is determined by: maxCCDist = 10000.0/(int)(sqrt(CN)+1)

Thus minimum cluster center distance when generating cluster center is as follows:

minCCDist = BCF x maxCCDist Then the cluster size is:

ClusterSize = ICF x minCCDist

Page 26: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

26 University of Minnesota

Experiment Parameters

Feature Types:2-7 Between-cluster Compactness Factor (BCF):

0.1-1.0 In-cluster Compactness Factor (ICF):0.1-0.5 Cluster Number(CN):20,50,100,200

Page 27: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

27 University of Minnesota

Synthetic Datasets Example

BCF=0.5,ICF=0.5,CN=20,Feature Type=2

BCF=0.5,ICF=0.3,CN=20,Feature Type=2

Page 28: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

28 University of Minnesota

Experiment Setup & Data Sets Setup

C / Pentium-IV 3.2GHz / Linux / 1GB Memory / Synthetic data

Synthetic data Scalability test in terms feature types Effect of data sets density Effect of Between-cluster compactness factor Effect of In-cluster compactness factor

Page 29: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

29 University of Minnesota

Scalability Test

Parameters Fixed:

BCF=0.1, ICF = 0.1, CN=20

Variable: feature types (2-7)

Trend PLUB is much

faster when number of features is high

Page 30: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

30 University of Minnesota

Effect of Data Sets Density

Parameters Fixed: FT = 7,

BCF=0.1, ICF=0.5

Variable: cluster number (20,50,100,200)

Trend PLUB is always

faster than RLORD for all densities of data sets

Page 31: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

31 University of Minnesota

Effect of Between-cluster Compactness Factor

Parameters Fixed: FT = 7,

ICF=0.3,CN=50, Variable: BCF

(0.1-1.0)

Page 32: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

32 University of Minnesota

Effect of Between-cluster Compactness Factor

Top: execution time v.s. BCF

Trend PLUB is faster

than RLORD when BCF is less than 0.7

PLUB is slower than RLORD when BCF is bigger than 0.7

Page 33: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

33 University of Minnesota

Effect of Between-cluster Compactness Factor

Bottom: Remaining ratio (r-ratio) and comparison ratio (c-ratio) v.s. BCF

Trend Ratios increase as

BCF increase Remaining ratio is

less than comparison ratio when BCF is less than 0.8

Page 34: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

34 University of Minnesota

Effect of Between-cluster Compactness Factor

Contradiction? Remaining ratio

increases, which means the pruning ratio decreases, the execution time decreases

when BCF increases, there are less leaf nodes intersected with current search bound. Thus the total possible candidate leaf node sequences decrease dramatically

Page 35: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

35 University of Minnesota

Effect of Between-cluster Compactness Factor

Key information when remaining

ratio is less than comparison ratio, PLUB runs faster

when remaining ratio is greater than comparison ratio, PLUB takes more time than RLORD.

Page 36: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

36 University of Minnesota

Effect of In-cluster Compactness Factor

Parameters Fixed: FT = 7,

BCF=0.1,CN=50,

Variable: ICF (0.1-0.5)

Trend PLUB is always

faster than RLORD for ICF from 0.1 to 0.5

Page 37: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

37 University of Minnesota

Conclusion and Future Work Formalized MTNN query problem Proposed PLUB based algorithm for MTNN

query Compared PLUB and RLORD

Design heuristic algorithms to tackle MTNN query problem in large number of feature types

Page 38: Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN)  Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

38 University of Minnesota

References

[1] M. Kolahdouzan, M. Sharifzadeh and C. Shahabi. The Optimal Sequenced Route Query. IN USC, CS Dept, Tech. Report 05-840, 2005

[2] Feifei Li, Dihan Cheng, Marios Hadjieleftherious, George Kollios and Shang-Hua Teng. On Trip Planning Queries in Spatial Databases. SSTD 2005.