finding probabilistic nearest n i hb f q obj t ...dbgroup/public... · – traditional problem in...

36
Finding Probabilistic Nearest N i hb f Q Obj t ith Neighbors for Query Objects with Imprecise Locations Yuichi Iijima and Yoshiharu Ishikawa Yuichi Iijima and Yoshiharu Ishikawa Nagoya University, Japan

Upload: others

Post on 27-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Finding Probabilistic Nearest N i hb f Q Obj t ithNeighbors for Query Objects with

Imprecise Locationsp

Yuichi Iijima and Yoshiharu IshikawaYuichi Iijima and Yoshiharu IshikawaNagoya University, Japan

Page 2: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Outline

• Background and Problem Formulation• Related Work• Query Processing Strategies• Query Processing Strategies• Experimental Results• Conclusions

1

Page 3: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Imprecise Location Information

• Sensor Environments– Measurement Noise– Frequent updates may not be possibleq p y p

• GPS-based positioning consumes batteries

• Robotics• Robotics– Localization using sensing and

t hi t imovement histories– Probabilistic approach has

vagueness• Privacy

2

Privacy– Location Anonymity

Page 4: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Nearest Neighbor Queries

• Nearest Neighbor QueriesE l Fi d th l t b t– Example: Find the closest bus stop

– Traditional problem in spatial databases• Efficient query processing using spatial indices• Extensible to multi-dimensional cases (e.g., image

retrieval)

• What happen if the NN of qpplocation of query object is uncertain? qis uncertain?

3

Page 5: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Example Scenario (1)

• Query: Find the nearest gas station

Location estimation based on noisy GPS0% based on noisy GPS data

35%

0%

35%

10%Nearest gas stationdepends on the possible10%

55%

depends on the possiblecar locations

55%

NN objects are definedi b bili ti

4

in a probabilistic way

Page 6: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Example Scenario (2)

• Mobile Robotics– Location of the robot is estimated

based on movement histories andsensor data

– Measurements are noisyy– Localization based on probabilistic

modelingmodeling• Kalman filter, particle filter, etc.

Estimated location is given as a– Estimated location is given as aprobabilistic density function (PDF)

PDF changes on each estimation• PDF changes on each estimation

5

Page 7: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Probabilistic Nearest Neighbor Query (1)

• PNNQ for short0 07 0.08

0 04 0.06 0.08

pq(x)

• Assumptions– Location of query object 0.01

0.02 0.03 0.04 0.05 0.06 0.07

0 0.020.04

q y jq is specified as a Gaussian distribution

0

-4-2

0

– Target data: static pointsG i Di t ib ti

2 4

• Gaussian Distribution

)()(11)( 1Σt

Σ C

)()(

2exp

||)2()( 1

2/12/ qxqxx ΣΣ

tdqp

6

– Σ: Covariance matrix

Page 8: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Probabilistic Nearest Neighbor Query (2)

• Definition),','Pr(),(Pr o'xox ooΟooqNN

• Find objects which}),(Pr,|{),( oqOooqPNNQ NN

• Find objects which satisfy the condition– The probability that

the object is theq

nearest neighbor of q is greater than or

7

equal to θ

Page 9: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Outline

• Background and Problem Formulation• Related Work• Query Processing Strategies• Query Processing Strategies• Experimental Results• Conclusions

8

Page 10: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Related Work

• Query processing methods for uncertain (location) datadata– Cheng, Prabhakar, et al. (SIGMOD’03, VLDB’04, …)

T t l (VLDB’05 TODS’07)– Tao et al. (VLDB’05, TODS’07)– Consider arbitrary PDFs or uniform PDFs

M t f th t t bj t i i– Most of the case, target objects are imprecise• Research related to Gaussian distribution

– Gauss-tree [Böhm et al., ICDE’06]• Target objects are based on Gaussian distributions

• Our former work– Ishikawa, Iijima, Yu (ICDE’09): Probabilistic , j , ( )

range queries9

Page 11: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Outline

• Background and Problem Formulation• Related Work• Query Processing Strategies• Query Processing Strategies• Experimental Results• Conclusions

10

Page 12: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Naïve Approach (1)

• Use of Voronoi Diagram– Well-known method for standard (non-imprecise)

nearest neighbor queries

aa bc

Voronoi region Ve

d

Voronoi region Ve

f g h

11

Page 13: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Naïve Approach (2)

• PrNN(q, o): Nearest neighbor probabilityP b bilit th t bj t i th t

pq(x)– Probability that object o is the nearest

neighbor of query object qIt can be computed by integrating the 0.02

0.03 0.04 0.05 0.06 0.07 0.08

0 0.02 0.04 0.06 0.08

– It can be computed by integrating the probability density function pq(x) over Voronoi region Vo

0 0.01

-4-2

0 2

4o o o eg o Vo

a b V qNN dpoq xx)(),(Pr

• Problemb

ec

qpq(x)

oV

– Need to consider all targetobjectsN i l i t ti (M t

e

g

d

h

qVe

– Numerical integration (MonteCarlo method) is quite costly!

12

fg h

Page 14: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Our Approach

• Outline of processing1. Filtering

• Prune non-candidate objects whose PrNN are obviously smaller than the threshold

• Low-cost filtering conditions2. Numerical integration for the remaining

candidate objects• Two strategies

region based Approach– -region-based Approach– SES-based Approach

• SES: Smallest Enclosing Sphere13

Page 15: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Strategy 1: -Region-Based Approach (1)

• -regionSi il t ft d i i f– Similar concepts are often used in query processing for uncertain spatial databasesDefinition: Ellipsoidal region for which the result of the– Definition: Ellipsoidal region for which the result of the integration becomes 1 – 2 :

)( d

21 )()(21)(

r qt

dpqxqx

xxΣ

The ellipsoidal region21

)()( rt

qxqx Σ

is the -region• Example: is specified as 1%

q• Example: is specified as 1%,

we consider 98% -region14

-region

Page 16: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Strategy 1: -Region-Based Approach (2)

• Query Processing1. -region for the query is computed at first

• -region can be derived using r-table:– It is constructed for the normal Gaussian ( = I, q = 0): Given ,

it returns appropriate rFinal region can be obtained by transformation• Final -region can be obtained by transformation

2. Derive the bounding box of i ba-region

3. Objects whose Voronoiq

b

e

ac

regions overlaps with the box are the candidates

q e

hf

dg

• {a, c, d, f, g}, in this example15

θ-region

f

Page 17: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Strategy 1: -Region-Based Approach (3)

• Derivation details• First we consider the

• Third, the bonding box is calculated• First, we consider the

normal Gaussian– Using r-table, we can get

is calculated

ii rw Using r table, we can get the appropriate r for given

• Second, a spherical - iii

ii

)(Σ

region is derived based on transformation

Transformation is performedwhere (Σ)ii is the (i, i) entry of Σ– Transformation is performed

by analyzing entry of Σ

xj

qr

q qwj

wxi

16

wj

wi wi

Page 18: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Strategy 2: SES-Based Approach (1)

• SES: Smallest Enclosing Sphere– For each Voronoi region Vo, we compute its SES

SESo beforehand

aSESe: SES for Voronoiregion of e

a bc g

ed

f g h

17

Page 19: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Strategy 2: SES-Based Approach (2)

• Integration over SESo gives the upper-boundf P ( )for PrNN(q, o)

dpdpoq xxxx )()()(Pr

– Integration over a sphere region is more easier

oo SES qV qNN dpdpoq xxxx )()(),(Pr

– Integration over a sphere region is more easier to compute

• We use a table called a b• We use a table called U-catalog constructed beforehand

a b

ec

q e

g

d

h

qpq(x)

18

fg h

Page 20: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Strategy 2: SES-Based Approach (3)

• What is U-catalog?Gi t t d it t di– Given two parameters and , it returns corresponding integralU catalog is made for different ( ) pairs by computing– U-catalog is made for different (, ) pairs by computing the integral of normal Gaussian over sphere region R

xxx

dpR

)(),( normα δ π(α, δ)

R

( , )0.0 0.1 …0.0 0.2 …

q

… … …1.0 0.1

19

q … … …

Page 21: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Strategy 2: SES-Based Approach (4)

• To use U-catalog, another approximation is required since it is only useful for normal Gaussianrequired since it is only useful for normal Gaussian– Use of upper-bounding function pq(x)

( ) ti htl b d ( ) d h h i l i f

T

T– pq(x) tightly bounds pq(x) and has spherical isosurfaces– For pq(x), we can easily derive its integral over SESo using

U catalog

T

T

U-catalog• In summary, we use two

i ti

ab

c

pq(x)T

approximations:e

c

dq

qNN dpoq xx)(),(Pr

fg

hpq(x)

o

o

SES q

V qNN

dp

pq

xx)(

)(),(

20

oSES q dp xx)(T

Page 22: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Strategy 2: SES-Based Approach (5)

• Bounding FunctionO i i l G i PDF– Original Gaussian PDF

)()(1exp1)( 1 qxqxx Σtp

)()(2

exp)2(

)( 2/12/qxqxx Σ

Σdqp

– Upper-Bounding Function

21)(

TT

2/12/ 2exp

)2()( qxxT

Σdqp

where

1

1d

i

tiii vvΣ )()( xx T

qq pp

21}min{

1

i

i

T

qq

is satisfied

Page 23: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Outline

• Background and Problem Formulation• Related Work• Query Processing Strategies• Query Processing Strategies• Experimental Results• and Conclusions

22

Page 24: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Setup of Experiments (1)

• Target Data– 2D point data (50K entries)– Based on road lineBased on road line

segments in Long Beach• Computation of Voronoi• Computation of Voronoi

regions and SESs– LEDA package was used

• ComparisonComparison– Strategies 1, 2, and their hybrid approach

E l ti t i R ti23

– Evaluation metric: Response time

Page 25: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Setup of Experiments (2)

• Default Parameters– Covariance matrix

732327Σ

qpq(x)

• For this matrix the shape of the isosurface of p (x) is an

732

For this matrix the shape of the isosurface of pq(x) is an ellipse titled at 30 degrees and the major-to-minor axis ratio is 3:1

• γ : Parameter for controlling vagueness (default: γ = 10)– Probability threshold value: θ = 0.01

24

Probability threshold value: θ 0.01– No. of samples for Monte Carlo method: 1,000,000

Page 26: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Candidate Objects in Strategy 1

Bounding box of the θ-regionof the θ region

Query center

Voronoi regions for candidate objects

Voronoi regions for answer objects

25

Page 27: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Candidate Objects in Strategy 2

Query center

Voronoi regions for candidate objects

Voronoi regions for answer objects

26

Page 28: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Candidate Objects in Hybrid Strategy

Bounding box of the θ-regionof the θ region

Query Center

Voronoi regions for candidate objects

Voronoi regions for answer objects

27

Page 29: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Experimental Results - Default Parameters

Filtering Compute Prob. Rest

• Most of the processing i i i

2.45 2.48 40.00

45.00 50.00

me[

s]

time is spent in numerical integration

43 70

2.47

25.00 30.00 35.00

nse

Tim

g

A t t th t

43.70 38.70

34.13

10.00 15.00 20.00

Res

pon

• A strategy that can prune more objects

0.09 0.19 0.13 0.00 5.00

Strategy 1 Strategy 2 Hybrid

R

has better performanceNumber of candidates 179 150 129Number of

28

Number of answers 26

Page 30: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Experimental Results - Different γ Values

γ = 1(almost exact)

γ = 10 γ = 50(too vague)( ) ( g )

90010.00

] 2 45450050.00

Filtering Compute Prob. Rest250.00

2.45 2.48 2.48

6.00 7.00 8.00 9.00

Tim

e[s] 2.45

2.48 2.47

30.00 35.00 40.00 45.00

2.45

150.00

200.00

6.28 6.05 5.53 2.00 3.00 4.00 5.00

espo

nse

43.70 38.70 34.13

10.00 15.00 20.00 25.00

209.22

70 88

2.47 2.46 50.00

100.00

0.09 0.13 0.13 0.00 1.00

Strategy 1 Strategy 2 Hybrid

Re

0.09 0.19 0.13 0.00 5.00

Strategy 1 Strategy 2 Hybrid0.09 0.22 0.13

70.88 66.41

0.00

Strategy 1 Strategy 2 Hybrid

Number of candidates 24 31 23Number of 8

179 150 129

26

847 276 260

15

29

answers 8 26 15

Query is costly when impreciseness is high

Page 31: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Experimental Results - Different θ Values

θ = 0.01 θ = 0.03 θ = 0.05

2 45450050.00

] 3500

40.00

Filtering Compute Prob. Rest35.00

2.45 2.48

2.47 30.00 35.00 40.00 45.00

Tim

e[s] 2.45

2 4625.00

30.00

35.00 2.45

20.00

25.00

30.00

43.70 38.70 34.13

10.00 15.00 20.00 25.00

espo

nse

31.75

20.03 18.42

2.46 2.46

10.00

15.00

20.00

26.56

11 79 11 06

2.46 2.46

500

10.00

15.00

0.09 0.19 0.13 0.00 5.00

Strategy 1 Strategy 2 Hybrid

Re

0.09 0.19 0.13 0.00

5.00

Strategy 1 Strategy 2 Hybrid0.09 0.19 0.13

11.79 11.06

0.00

5.00

Strategy 1 Strategy 2 Hybrid

Number of candidates 179 150 129Number of 26

128 76 68

7

107 44 40

3

30

answers 26 7 3

Page 32: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Experimental Results - Different Shapes

Circle Default Ellipse Narrow Ellipse

3000

35.00

s] 2.45 45.00 50.00

Filtering Compute Prob. Rest

2 4690.00 100.00

2.45

20.00

25.00

30.00

e Ti

me[

s

2.48 2.47

250030.00 35.00 40.00

2.45

2.46

2.46

500060.00 70.00 80.00

26.97

13.60 13.42

2.46 2.45

500

10.00

15.00

Res

pons

e

43.70 38.70 34.13

10.00 15.00 20.00 25.00

69.69 84.56

62.56 20.00 30.00 40.00 50.00

0.09 0.15 0.13 0.00

5.00

Strategy 1 Strategy 2 Hybrid

R

0.09 0.19 0.13 0.00 5.00

Strategy 1 Strategy 2 Hybrid0.09 0.20 0.13 0.00

10.00

Strategy 1 Strategy 2 Hybrid

Number of candidates 115 50 49Number of 26

179 150 129

26

276 366 250

24

31

answers 26 26 24

Narrow ellipse (correlated distribution) needs to consider additional candidates

Page 33: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Conclusions of Experimental Results

• Most of the processing time is spent in numerical integrationnumerical integration⇒A strategy that can prune more objects has

better performanceSuperiority or inferiority of two strategies Superiority or inferiority of two strategies depends on the given query and the specified

tparameters No apparent winner

• The hybrid strategy inherits the benefits of two strategies and shows the best performancestrategies and shows the best performance

32

Page 34: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Outline

• Background and Problem Formulation• Related Work• Query Processing Strategies• Query Processing Strategies• Experimental Results• Conclusions

33

Page 35: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Conclusions

• Nearest neighbor query processing methods f i i bj tfor imprecise query objects– Location of query object is represented by q y j p y

Gaussian distribution– Two strategies and their combinationTwo strategies and their combination– Reduction of numerical integration is important

Proposal of two pruning strategies and their– Proposal of two pruning strategies and their evaluation

• Future work– Evaluation for multi-dimensional cases (d 3)( )

34

Page 36: Finding Probabilistic Nearest N i hb f Q Obj t ...dbgroup/public... · – Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible

Finding Probabilistic Nearest N i hb f Q Obj t ithNeighbors for Query Objects with

Imprecise Locationsp

Yuichi Iijima and Yoshiharu IshikawaYuichi Iijima and Yoshiharu IshikawaNagoya University, Japan