hypersphere dominance: an optimal approach

Post on 23-Feb-2016

45 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Hypersphere Dominance: An Optimal Approach. Cheng Long , Raymond Chi-Wing Wong, Bin Zhang, Min Xie The Hong Kong University of Science and Technology Prepared by Cheng Long Presented by Cheng Long 24 June, 2014. Hyperspheres. A hypersphere in a d-dimensional space (center, radius ) - PowerPoint PPT Presentation

TRANSCRIPT

1

Hypersphere Dominance: An Optimal Approach

Cheng Long, Raymond Chi-Wing Wong, Bin Zhang, Min XieThe Hong Kong University of Science and Technology

Prepared by Cheng LongPresented by Cheng Long

24 June, 2014

Hyperspheres A hypersphere in a d-dimensional space

(center, radius) the set of all points that have their distances from

the center bounded by the radius

2

π‘π‘Ÿ π‘Ÿ

𝑐

2D: a disk 3D: a ball

Hyperspheres are commonly used Uncertain databases

the location of an uncertain object Spatial databases

SS-tree, SS+-tree, M-tree, VP-tree and SR-tree

3

SS-tree: similar to R-tree with hyperrectangles replaced by hyperspheres

SS-tree based on A-Hlayout of 8 objects: A-H

Motivating example Scenario

Ada has her location uncertain, but constrained in a disk Sa. Bob has his location uncertain, but constrained in a disk Sb. Connie has her location uncertain, but constrained in a disk Sq.

Question Is Ada always closer to Connie than Bob?

4

(Ada)

Sb (Bob)

Sq (Connie) Sq

(Connie)

(Ada)

Sb (Bob)

NoFor this specification of the locations, Ada is closer to Connie than Bob

In fact, for all specifications of the locations, Ada is closer to Connie than Bob

Yes

Hypersphere dominance: definition

5

Definition 1: Hypersphere dominanceGiven

, , and , it decides whether

Dominance conditionYes: No:

Basic operator used in many queries Probabilistic RkNN query [Lian and Chen, VLDBJ’09] AkNN query [Emrich et al., SSDBM’10] kNN query [Long et al., SIGMOD’14]

Hypersphere dominance: existing solutionsβ€”overview MinMax [Roussopoulos et al., SIGMOD Record’95; Hjaltason and Samet,

TODS’99] MBR [Emrich et al., SIGMOD’10]

GP [Lian and Chen, VLDBJ’09]

Trigonometric [Emrich et al., SSDBM’10]

6

Hypersphere dominance: existing solutionsβ€”MinMax (1)

7

π‘†π‘Ž 𝑆𝑏

𝑐 π‘Ž 𝑐 π‘π‘Ÿ π‘Ž π‘Ÿ 𝑏

π‘€π‘Žπ‘₯𝐷𝑖𝑠𝑑 (π‘†π‘Ž ,𝑆𝑏)=𝐷𝑖𝑠𝑑 (𝑐 π‘Ž ,𝑐𝑏 )+π‘Ÿπ‘Ž+π‘Ÿ 𝑏 =

( and Sb overlap), – – ( and Sb do not overlap)

π‘†π‘Žπ‘ π‘Ž 𝑐 π‘π‘Ÿ π‘Ž π‘Ÿ 𝑏

𝑆𝑏

Definition: the maximum distance between a point in and a point in Sb

Definition: the minimum distance between a point in Sa and a point in Sb

π‘€π‘Žπ‘₯𝐷𝑖𝑠𝑑 (π‘†π‘Ž ,𝑆𝑏) 𝑀𝑖𝑛𝐷𝑖𝑠𝑑 (π‘†π‘Ž ,𝑆𝑏)𝑀𝑖𝑛𝐷𝑖𝑠𝑑 (π‘†π‘Ž ,𝑆𝑏)=0

𝑆𝑏𝑆 π‘Žπ‘ π‘Ž 𝑐 π‘π‘Ÿ π‘Ž π‘Ÿ 𝑏

Hypersphere dominance: existing solutionsβ€”MinMax (2)

8

MinMaxCompute Compute If

Return Else

Return

𝑆 π‘ŽSb

Sq 𝑆 π‘Ž

Sb

Sq

π‘€π‘Žπ‘₯𝐷𝑖𝑠𝑑 (π‘†π‘Ž,π‘†π‘ž)𝑀𝑖𝑛𝐷𝑖𝑠𝑑 (𝑆𝑏 ,π‘†π‘ž)

π‘€π‘Žπ‘₯𝐷𝑖𝑠𝑑 (π‘†π‘Ž ,𝑆𝑏)

𝑀𝑖𝑛𝐷𝑖𝑠𝑑 (𝑆𝑏 ,π‘†π‘ž)

π·π‘œπ‘š(π‘†π‘Ž ,𝑆𝑏 ,π‘†π‘ž)=π‘‘π‘Ÿπ‘’π‘’MinMax returns

β€œfalse negative”

<

MinMax returns

>

correct π·π‘œπ‘š(π‘†π‘Ž ,𝑆𝑏 ,π‘†π‘ž)=π‘‘π‘Ÿπ‘’π‘’

bisector and

Hypersphere dominance: existing solutions--Insufficiency

Methods Correct? Sound? Efficient?MinMax Yes No Yes

MBR Yes No YesGP Yes No Yes

Trigonometric No Yes Yes

9

Criteria of a method:1. Correctness: No false positive2. Soundness: No false negative3. Efficiency: runs in O(d) where d is the number of dimensionality

Our approach is the only one which is correct, sound and efficient!

Our approach(Hyperbola)

Yes Yes Yes

Our approach: major idea Step 1: pre-checking

Do the decision directly Step 2: dominance checking

Drive an equivalent condition of which is easier to decide Do the decision

10

For cases where it is easy to decide whether the dominance condition is true For cases where it is difficult to decide whether the dominance condition is true directly

Our approach: pre-checking

11

𝑆 π‘Ž

Sb

Sq 𝑆 π‘Ž

Sb

Sq

Step 1: Pre-checking:If and Sb overlap

Return If Sb and Sq overlap

Return and Sb overlap

π·π‘œπ‘š(π‘†π‘Ž ,𝑆𝑏 ,π‘†π‘ž)= π‘“π‘Žπ‘™π‘ π‘’Sb and Sq overlap

π·π‘œπ‘š (π‘†π‘Ž ,𝑆𝑏 ,π‘†π‘ž)= π‘“π‘Žπ‘™π‘ π‘’

Our approach: dominance checking (1)

12

Dominance condition:

Equivalent condition (1):

Proof of the equivalence between Condition (1) and Condition (2):β€œ=>”: By contradiction β€œ<=”:

Step 2: Dominance checking:Derive an equivalent condition of and check whether the derived condition is true

Our approach: dominance checking (5)

13

Equivalent condition (2):

Equivalent condition (3):

π‘€π‘Žπ‘₯𝐷𝑖𝑠𝑑 (π‘ž ,𝑆 π‘Ž)=𝐷𝑖𝑠𝑑 (π‘ž ,π‘π‘Ž )+π‘Ÿ π‘Ž+0=𝐷𝑖𝑠𝑑 (π‘ž ,π‘π‘Ž )+π‘Ÿ π‘Ž π‘€βˆˆπ·π‘–π‘ π‘‘(π‘ž ,𝑆𝑏)=𝐷𝑖𝑠𝑑 (π‘ž ,𝑐𝑏 )βˆ’π‘Ÿ π‘βˆ’0=𝐷𝑖𝑠𝑑 (π‘ž ,𝑐𝑏)βˆ’π‘Ÿ 𝑏

Our approach: dominance checking (3)

14

Space partitioning: Boundary : Region : Region :

Boundary : Region Ra

Region Rb

Equivalent condition (4): is in Region ( is in Region )

SaSb

ca

cb

Sqcq

Equivalent condition (3):

Our approach: dominance checking (4)

15

Equivalent condition (5): is in Region and

Equivalent condition (4): is in Region

rq

π‘šπ‘–π‘›π‘₯βˆˆπ‘ƒπ·π‘–π‘ π‘‘ (π‘π‘ž ,π‘₯ )

SaSb

ca

cb

Region Ra

Region Rb

Sqcq

Boundary :

π‘šπ‘–π‘›π‘₯βˆˆπ‘ƒπ·π‘–π‘ π‘‘ (π‘π‘ž ,π‘₯ )>π‘Ÿπ‘ž

is Region

is in Region

Our approach (2)

Compute constraint: objective: minimize

We use the Lagrange Multiplier (LM) method. Details could be found in the paper

16

correct sound efficientThe condition (3) is equivalent to the dominance conditionEach condition transformation takes O(d) time and the cost of LM is also O(d)

Equivalent condition (5): is in Region and

Space partitioning: Boundary : Region : Region :

Empirical study: set-up Datasets:

Real datasets: NBA, Color, Texture, and Forest Synthetic datasets

Algorithms: MinMax, MBR, GP, Trigonometric, Hyperbola (our

method) Measures:

precision = TP/(TP+FP) recall = TP/(TP+FN) running time

17

A correct method has the precision always equal to 1A sound method has the recall always equal to 1

Criteria of a method:1. Correctness: No false positive (FP)2. Soundness: No false negative (FN)3. Efficiency: runs in O(d) where d is the number of dimensionality

Empirical study: results (precision, NBA) All algorithms except Trigonometric have

precisions = 1.

18

Methods Correct? Sound? Efficient?MinMax Yes No Yes

MBR Yes No YesGP Yes No Yes

Trigonometric No Yes YesOur approach Yes Yes Yes

Empirical study: results (recall, NBA) Only our approach (Hyperbola) and

Trigonometirc have recalls = 1.

19

Methods Correct? Sound? Efficient?MinMax Yes No Yes

MBR Yes No YesGP Yes No Yes

Trigonometric No Yes YesOur approach Yes Yes Yes

Empirical study: results (running time, NBA) MinMax < GP < Hyperbola (our method) <

MBR < Trigonometric

20

Conclusion First solution for the hypersphere dominance

problem, which is correct, sound and efficient for any dimension

An application study: kNN Experiments

21

Q & A

22

The following slides are for backup use only

23

Hyperspheres in uncertain databases Song and Roussopoulos [SSTD’01] Cheng et al. [TKDE’04] Chen and Cheng [ICDE’07] Beskales et al. [PVLDB’08]

24

Our approach (1)

25

Dominance condition:

Equivalent condition (1): :

Major idea:Derive an equivalent condition of and check whether the derived condition is true

Equivalent condition (2):

Equivalent condition (3): and :

Definition 1: Hypersphere dominanceGiven

, , and , it decides whether

Dominance conditionYes: No:

An application study: kNN qeury kNN query:

Given a set D of hyperspheres, , , …, , a query hypershere , and an integer ,

the query finds a set of hyperspheres in D each of which is not dominated by wrt where is the hypersphere in D with the k-th smallest maximum distance from .

Solution: A best-first search algorithm based on SS-tree Some pruning strategies 26

27

Boundary : Region Ra

Region RbIllustration 1: 2D space, and are two points (i.e., = 0, = 0)Sb ()

SqSa () cq

top related