finding probabilistic nearest n i hb f q obj t ...dbgroup/public... · – traditional problem in...
TRANSCRIPT
Finding Probabilistic Nearest N i hb f Q Obj t ithNeighbors for Query Objects with
Imprecise Locationsp
Yuichi Iijima and Yoshiharu IshikawaYuichi Iijima and Yoshiharu IshikawaNagoya University, Japan
Outline
• Background and Problem Formulation• Related Work• Query Processing Strategies• Query Processing Strategies• Experimental Results• Conclusions
1
Imprecise Location Information
• Sensor Environments– Measurement Noise– Frequent updates may not be possibleq p y p
• GPS-based positioning consumes batteries
• Robotics• Robotics– Localization using sensing and
t hi t imovement histories– Probabilistic approach has
vagueness• Privacy
2
Privacy– Location Anonymity
Nearest Neighbor Queries
• Nearest Neighbor QueriesE l Fi d th l t b t– Example: Find the closest bus stop
– Traditional problem in spatial databases• Efficient query processing using spatial indices• Extensible to multi-dimensional cases (e.g., image
retrieval)
• What happen if the NN of qpplocation of query object is uncertain? qis uncertain?
3
Example Scenario (1)
• Query: Find the nearest gas station
Location estimation based on noisy GPS0% based on noisy GPS data
35%
0%
35%
10%Nearest gas stationdepends on the possible10%
55%
depends on the possiblecar locations
55%
NN objects are definedi b bili ti
4
in a probabilistic way
Example Scenario (2)
• Mobile Robotics– Location of the robot is estimated
based on movement histories andsensor data
– Measurements are noisyy– Localization based on probabilistic
modelingmodeling• Kalman filter, particle filter, etc.
Estimated location is given as a– Estimated location is given as aprobabilistic density function (PDF)
PDF changes on each estimation• PDF changes on each estimation
5
Probabilistic Nearest Neighbor Query (1)
• PNNQ for short0 07 0.08
0 04 0.06 0.08
pq(x)
• Assumptions– Location of query object 0.01
0.02 0.03 0.04 0.05 0.06 0.07
0 0.020.04
q y jq is specified as a Gaussian distribution
0
-4-2
0
– Target data: static pointsG i Di t ib ti
2 4
• Gaussian Distribution
)()(11)( 1Σt
Σ C
)()(
2exp
||)2()( 1
2/12/ qxqxx ΣΣ
tdqp
6
– Σ: Covariance matrix
Probabilistic Nearest Neighbor Query (2)
• Definition),','Pr(),(Pr o'xox ooΟooqNN
• Find objects which}),(Pr,|{),( oqOooqPNNQ NN
• Find objects which satisfy the condition– The probability that
the object is theq
nearest neighbor of q is greater than or
7
equal to θ
Outline
• Background and Problem Formulation• Related Work• Query Processing Strategies• Query Processing Strategies• Experimental Results• Conclusions
8
Related Work
• Query processing methods for uncertain (location) datadata– Cheng, Prabhakar, et al. (SIGMOD’03, VLDB’04, …)
T t l (VLDB’05 TODS’07)– Tao et al. (VLDB’05, TODS’07)– Consider arbitrary PDFs or uniform PDFs
M t f th t t bj t i i– Most of the case, target objects are imprecise• Research related to Gaussian distribution
– Gauss-tree [Böhm et al., ICDE’06]• Target objects are based on Gaussian distributions
• Our former work– Ishikawa, Iijima, Yu (ICDE’09): Probabilistic , j , ( )
range queries9
Outline
• Background and Problem Formulation• Related Work• Query Processing Strategies• Query Processing Strategies• Experimental Results• Conclusions
10
Naïve Approach (1)
• Use of Voronoi Diagram– Well-known method for standard (non-imprecise)
nearest neighbor queries
aa bc
Voronoi region Ve
d
Voronoi region Ve
f g h
11
Naïve Approach (2)
• PrNN(q, o): Nearest neighbor probabilityP b bilit th t bj t i th t
pq(x)– Probability that object o is the nearest
neighbor of query object qIt can be computed by integrating the 0.02
0.03 0.04 0.05 0.06 0.07 0.08
0 0.02 0.04 0.06 0.08
– It can be computed by integrating the probability density function pq(x) over Voronoi region Vo
0 0.01
-4-2
0 2
4o o o eg o Vo
a b V qNN dpoq xx)(),(Pr
• Problemb
ec
qpq(x)
oV
– Need to consider all targetobjectsN i l i t ti (M t
e
g
d
h
qVe
– Numerical integration (MonteCarlo method) is quite costly!
12
fg h
Our Approach
• Outline of processing1. Filtering
• Prune non-candidate objects whose PrNN are obviously smaller than the threshold
• Low-cost filtering conditions2. Numerical integration for the remaining
candidate objects• Two strategies
region based Approach– -region-based Approach– SES-based Approach
• SES: Smallest Enclosing Sphere13
Strategy 1: -Region-Based Approach (1)
• -regionSi il t ft d i i f– Similar concepts are often used in query processing for uncertain spatial databasesDefinition: Ellipsoidal region for which the result of the– Definition: Ellipsoidal region for which the result of the integration becomes 1 – 2 :
)( d
21 )()(21)(
r qt
dpqxqx
xxΣ
The ellipsoidal region21
)()( rt
qxqx Σ
is the -region• Example: is specified as 1%
q• Example: is specified as 1%,
we consider 98% -region14
-region
Strategy 1: -Region-Based Approach (2)
• Query Processing1. -region for the query is computed at first
• -region can be derived using r-table:– It is constructed for the normal Gaussian ( = I, q = 0): Given ,
it returns appropriate rFinal region can be obtained by transformation• Final -region can be obtained by transformation
2. Derive the bounding box of i ba-region
3. Objects whose Voronoiq
b
e
ac
regions overlaps with the box are the candidates
q e
hf
dg
• {a, c, d, f, g}, in this example15
θ-region
f
Strategy 1: -Region-Based Approach (3)
• Derivation details• First we consider the
• Third, the bonding box is calculated• First, we consider the
normal Gaussian– Using r-table, we can get
is calculated
ii rw Using r table, we can get the appropriate r for given
• Second, a spherical - iii
ii
)(Σ
region is derived based on transformation
Transformation is performedwhere (Σ)ii is the (i, i) entry of Σ– Transformation is performed
by analyzing entry of Σ
xj
qr
q qwj
wxi
16
wj
wi wi
Strategy 2: SES-Based Approach (1)
• SES: Smallest Enclosing Sphere– For each Voronoi region Vo, we compute its SES
SESo beforehand
aSESe: SES for Voronoiregion of e
a bc g
ed
f g h
17
Strategy 2: SES-Based Approach (2)
• Integration over SESo gives the upper-boundf P ( )for PrNN(q, o)
dpdpoq xxxx )()()(Pr
– Integration over a sphere region is more easier
oo SES qV qNN dpdpoq xxxx )()(),(Pr
– Integration over a sphere region is more easier to compute
• We use a table called a b• We use a table called U-catalog constructed beforehand
a b
ec
q e
g
d
h
qpq(x)
18
fg h
Strategy 2: SES-Based Approach (3)
• What is U-catalog?Gi t t d it t di– Given two parameters and , it returns corresponding integralU catalog is made for different ( ) pairs by computing– U-catalog is made for different (, ) pairs by computing the integral of normal Gaussian over sphere region R
xxx
dpR
)(),( normα δ π(α, δ)
R
( , )0.0 0.1 …0.0 0.2 …
q
… … …1.0 0.1
19
q … … …
Strategy 2: SES-Based Approach (4)
• To use U-catalog, another approximation is required since it is only useful for normal Gaussianrequired since it is only useful for normal Gaussian– Use of upper-bounding function pq(x)
( ) ti htl b d ( ) d h h i l i f
T
T– pq(x) tightly bounds pq(x) and has spherical isosurfaces– For pq(x), we can easily derive its integral over SESo using
U catalog
T
T
U-catalog• In summary, we use two
i ti
ab
c
pq(x)T
approximations:e
c
dq
qNN dpoq xx)(),(Pr
fg
hpq(x)
o
o
SES q
V qNN
dp
pq
xx)(
)(),(
20
oSES q dp xx)(T
Strategy 2: SES-Based Approach (5)
• Bounding FunctionO i i l G i PDF– Original Gaussian PDF
)()(1exp1)( 1 qxqxx Σtp
)()(2
exp)2(
)( 2/12/qxqxx Σ
Σdqp
– Upper-Bounding Function
21)(
TT
2/12/ 2exp
)2()( qxxT
Σdqp
where
1
1d
i
tiii vvΣ )()( xx T
qq pp
21}min{
1
i
i
T
is satisfied
Outline
• Background and Problem Formulation• Related Work• Query Processing Strategies• Query Processing Strategies• Experimental Results• and Conclusions
22
Setup of Experiments (1)
• Target Data– 2D point data (50K entries)– Based on road lineBased on road line
segments in Long Beach• Computation of Voronoi• Computation of Voronoi
regions and SESs– LEDA package was used
• ComparisonComparison– Strategies 1, 2, and their hybrid approach
E l ti t i R ti23
– Evaluation metric: Response time
Setup of Experiments (2)
• Default Parameters– Covariance matrix
732327Σ
qpq(x)
• For this matrix the shape of the isosurface of p (x) is an
732
For this matrix the shape of the isosurface of pq(x) is an ellipse titled at 30 degrees and the major-to-minor axis ratio is 3:1
• γ : Parameter for controlling vagueness (default: γ = 10)– Probability threshold value: θ = 0.01
24
Probability threshold value: θ 0.01– No. of samples for Monte Carlo method: 1,000,000
Candidate Objects in Strategy 1
Bounding box of the θ-regionof the θ region
Query center
Voronoi regions for candidate objects
Voronoi regions for answer objects
25
Candidate Objects in Strategy 2
Query center
Voronoi regions for candidate objects
Voronoi regions for answer objects
26
Candidate Objects in Hybrid Strategy
Bounding box of the θ-regionof the θ region
Query Center
Voronoi regions for candidate objects
Voronoi regions for answer objects
27
Experimental Results - Default Parameters
Filtering Compute Prob. Rest
• Most of the processing i i i
2.45 2.48 40.00
45.00 50.00
me[
s]
time is spent in numerical integration
43 70
2.47
25.00 30.00 35.00
nse
Tim
g
A t t th t
43.70 38.70
34.13
10.00 15.00 20.00
Res
pon
• A strategy that can prune more objects
0.09 0.19 0.13 0.00 5.00
Strategy 1 Strategy 2 Hybrid
R
has better performanceNumber of candidates 179 150 129Number of
28
Number of answers 26
Experimental Results - Different γ Values
γ = 1(almost exact)
γ = 10 γ = 50(too vague)( ) ( g )
90010.00
] 2 45450050.00
Filtering Compute Prob. Rest250.00
2.45 2.48 2.48
6.00 7.00 8.00 9.00
Tim
e[s] 2.45
2.48 2.47
30.00 35.00 40.00 45.00
2.45
150.00
200.00
6.28 6.05 5.53 2.00 3.00 4.00 5.00
espo
nse
43.70 38.70 34.13
10.00 15.00 20.00 25.00
209.22
70 88
2.47 2.46 50.00
100.00
0.09 0.13 0.13 0.00 1.00
Strategy 1 Strategy 2 Hybrid
Re
0.09 0.19 0.13 0.00 5.00
Strategy 1 Strategy 2 Hybrid0.09 0.22 0.13
70.88 66.41
0.00
Strategy 1 Strategy 2 Hybrid
Number of candidates 24 31 23Number of 8
179 150 129
26
847 276 260
15
29
answers 8 26 15
Query is costly when impreciseness is high
Experimental Results - Different θ Values
θ = 0.01 θ = 0.03 θ = 0.05
2 45450050.00
] 3500
40.00
Filtering Compute Prob. Rest35.00
2.45 2.48
2.47 30.00 35.00 40.00 45.00
Tim
e[s] 2.45
2 4625.00
30.00
35.00 2.45
20.00
25.00
30.00
43.70 38.70 34.13
10.00 15.00 20.00 25.00
espo
nse
31.75
20.03 18.42
2.46 2.46
10.00
15.00
20.00
26.56
11 79 11 06
2.46 2.46
500
10.00
15.00
0.09 0.19 0.13 0.00 5.00
Strategy 1 Strategy 2 Hybrid
Re
0.09 0.19 0.13 0.00
5.00
Strategy 1 Strategy 2 Hybrid0.09 0.19 0.13
11.79 11.06
0.00
5.00
Strategy 1 Strategy 2 Hybrid
Number of candidates 179 150 129Number of 26
128 76 68
7
107 44 40
3
30
answers 26 7 3
Experimental Results - Different Shapes
Circle Default Ellipse Narrow Ellipse
3000
35.00
s] 2.45 45.00 50.00
Filtering Compute Prob. Rest
2 4690.00 100.00
2.45
20.00
25.00
30.00
e Ti
me[
s
2.48 2.47
250030.00 35.00 40.00
2.45
2.46
2.46
500060.00 70.00 80.00
26.97
13.60 13.42
2.46 2.45
500
10.00
15.00
Res
pons
e
43.70 38.70 34.13
10.00 15.00 20.00 25.00
69.69 84.56
62.56 20.00 30.00 40.00 50.00
0.09 0.15 0.13 0.00
5.00
Strategy 1 Strategy 2 Hybrid
R
0.09 0.19 0.13 0.00 5.00
Strategy 1 Strategy 2 Hybrid0.09 0.20 0.13 0.00
10.00
Strategy 1 Strategy 2 Hybrid
Number of candidates 115 50 49Number of 26
179 150 129
26
276 366 250
24
31
answers 26 26 24
Narrow ellipse (correlated distribution) needs to consider additional candidates
Conclusions of Experimental Results
• Most of the processing time is spent in numerical integrationnumerical integration⇒A strategy that can prune more objects has
better performanceSuperiority or inferiority of two strategies Superiority or inferiority of two strategies depends on the given query and the specified
tparameters No apparent winner
• The hybrid strategy inherits the benefits of two strategies and shows the best performancestrategies and shows the best performance
32
Outline
• Background and Problem Formulation• Related Work• Query Processing Strategies• Query Processing Strategies• Experimental Results• Conclusions
33
Conclusions
• Nearest neighbor query processing methods f i i bj tfor imprecise query objects– Location of query object is represented by q y j p y
Gaussian distribution– Two strategies and their combinationTwo strategies and their combination– Reduction of numerical integration is important
Proposal of two pruning strategies and their– Proposal of two pruning strategies and their evaluation
• Future work– Evaluation for multi-dimensional cases (d 3)( )
34
Finding Probabilistic Nearest N i hb f Q Obj t ithNeighbors for Query Objects with
Imprecise Locationsp
Yuichi Iijima and Yoshiharu IshikawaYuichi Iijima and Yoshiharu IshikawaNagoya University, Japan