nearest-neighbor searching under uncertainty wuzhou zhang joint work with pankaj k. agarwal, alon...
DESCRIPTION
Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc. 3TRANSCRIPT
![Page 1: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/1.jpg)
Nearest-Neighbor Searching Under UncertaintyWuzhou Zhang
RIP Final and Masters, March 22, 2012
Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman. To appear in PODS 2012.
![Page 2: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/2.jpg)
2
Nearest-Neighbor Searching
ApplicationsPattern Recognition, Data CompressionStatistical Classification, ClusteringDatabases, Information RetrievalComputer Vision, etc.
http://en.wikipedia.org/wiki/Nearest_neighbor_search
𝑆𝑞
𝑝∗
a set of points in
any query point in
Find the closest point to
![Page 3: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/3.jpg)
3
Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc.
𝑞
What is the “nearest neighbor” of now?
![Page 4: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/4.jpg)
4
Our Model and Problem Statement Uncertain point : represented as a probability density function (pdf)
Expected distance:
: probabilities/weights: distance function
Let , find the expected nearest neighbor (ENN) of :
Or an -ENN :
0.1 0.2 0.3 0.4
![Page 5: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/5.jpg)
5
Previous work and Our contribution Previous work • The expected -NN under metric: ε-approximation
[Ljosa2007]• Aggregate nearest neighbor (ANN) under the SUM
function [Li2011, Sharifzadeh2010, Lian2008, etc]• All based on heuristics Our contribution
First nontrivial methods for answering exact or -approximate ENN queries with provable performance guarantees
![Page 6: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/6.jpg)
6
Summary of resultsDistanc
e functio
n
Settings Preprocessing time Space Query time
Squared Euclidean distance
Uncertain data
Uncertain query
Rectilinear metric
Uncertain data
Uncertain query
Euclidean metric(-ENN)
Uncertain data
Uncertain query
Results in , extends to higher dimensions
![Page 7: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/7.jpg)
7
Voronoi Diagram
Voronoi cell: Voronoi diagram : decomposition induced by
Preprocessing time
Space
Query time
![Page 8: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/8.jpg)
8
Expected Voronoi Diagram
Expected Voronoi cell
Expected Voronoi diagram : induced by
An example in metric
![Page 9: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/9.jpg)
9
Minimization diagram
The lower envelope of :
: the projection of the graph of
![Page 10: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/10.jpg)
10
Squared Euclidean distanceUncertain data : the centroid of
Then,
• Replace by with weight • same as the power diagram WPD
Preprocessing time
Space Query time
Remarks: Works for any distribution
![Page 11: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/11.jpg)
11
Rectilinear metricUncertain data Assume metric: Size of : Lower bound construction
the inverse Ackermann function Remarks: Extends to metric
![Page 12: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/12.jpg)
12
Rectilinear metricUncertain data (cont.) A near-linear size index exists despite size of
linear pieces!
𝑝𝑖𝑗
− (𝑥𝑝 𝑖𝑗−𝑥𝑞)+(𝑦𝑝 𝑖𝑗
− 𝑦𝑞)
− (𝑥𝑝 𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝𝑖𝑗
− 𝑦𝑞)
(𝑥𝑝𝑖𝑗−𝑥𝑞)+ (𝑦 𝑝𝑖𝑗
− 𝑦𝑞)
(𝑥𝑝𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝 𝑖𝑗
−𝑦𝑞 )𝑝𝑖𝑗
Linear!
𝑃 𝑖
![Page 13: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/13.jpg)
13
Rectilinear metricUncertain data (cont.)
Preprocessing time
Space Query time
Remarks: Extends to higher dimensions
![Page 14: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/14.jpg)
14
Euclidean metric (-ENN)Uncertain data Approximate by
Outside the grid:
Inside the gird:
Total # of cells:: outermost square: the collection of squares
Remarks: Extends to any metric
![Page 15: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/15.jpg)
15
Euclidean metric (-ENN)Uncertain data (cont.)
and and : generated by Arya’s data structure on A linear size approximate !
15
Quadtree: 4-way tree
Preprocessing time Space Query time
![Page 16: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/16.jpg)
16
Further work
Is there a linear-size index to answer the following queries in sublinear time in the worst case?
• the nearest neighbor with highest probability• the nearest neighbors with probability higher than
THANKS
![Page 17: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/17.jpg)
17
Squared Euclidean distanceUncertain query
: the centroid of
Preprocessing• Compute the Voronoi diagram VD Query• Given , compute in , then query VD with
Preprocessing time
Space Query time
Remarks: Extends to higher dimensions and works for any distribution
![Page 18: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/18.jpg)
18
Rectilinear metricUncertain query Similarly, linear pieces
Preprocessing time
Space
Query time
![Page 19: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…](https://reader036.vdocument.in/reader036/viewer/2022062601/5a4d1c077f8b9ab0599f1c1f/html5/thumbnails/19.jpg)
19
Euclidean metric (-ENN)Uncertain query
Preprocessing time
Space
Query time
Remarks: Extends to higher dimensions