navigating nets: simple algorithms for proximity search robert krauthgamer (ibm almaden) joint work...

Navigating Nets: Simple algorithms for

proximity search

Robert Krauthgamer (IBM Almaden)Joint work with James R. Lee (UC Berkeley)

Navigating Nets 2

A classical problemFix a metric space (X,d):

X = set of points.

d = distance function over X.

Near-neighbor search (NNS) [Minsky-Papert]:

1. Preprocess a given n-point subset S X.

2. Given a query point q 2 X, quickly compute the closest point to q among S.

Navigating Nets 3

Variations on NNS(1+)-approximate nearest neighbor search: Find a2X such that d(q,a) · (1+) d(q,S).

Dynamic case: Allow updates to S (insertions and deletions).

Distributed case: No central index (e.g., nodes in a network). Other cost measures (e.g., communication, stretch, load).

Navigating Nets 4

General metrics Only oracle access to distance function d(¢,¢). Models a complicated metric or on-demand measurement. No “hashing of coordinates” or tuning for a specific metric.

Goal: efficient query (sublinear or polylog time). Impossible, even if the data set S is a path metric:

1 2 n

n-1n n

What about approximate NNS?

Navigating Nets 5

Approximate NNSHard even for (near) uniform metrics d(x,y) = 1 for all x,y2S.

1

11

But many data sets lack large uniform subsets.

Can we quantify this?

Navigating Nets 6

Abstract dimensionThe doubling constant X of a metric (X,d) is the minimum such that every ball can be covered by balls of half the radius.

The metric is doubling if X = O(1).

The (abstract) dimension is dim (X) = log2 X.

Immediate properties: dimA(Rd , || · ||2) = O(d).

dimA(X’) dimA(X) for all X’ X.

dimA(X) log |X|. (Equality for a uniform metric.)

Navigating Nets 7

IllustrationGrid with missing piece

Navigating Nets 8


Low-dimensional manifold (bounded curvature)

Navigating Nets 9


Manifold

Union of curves in Euclidean space

Navigating Nets 10

Embedding doubling metricsTheorem [Assouad, 1983] [Gupta, K., Lee, 2003]: Fix 0<<1, and let (X,d) be a doubling metric. Then (X,d) can be embedded with O(1) distortion into l2O(1).

Not true for =1 [Semmes, 1996].

Motivation: Embed S and then apply Euclidean NNS.

Navigating Nets 11

Our resultsSimple data structure for maintaining S: (1+)-NNS query time: (1/)O(dim(S)) · log (for <½), where

dmax/dmin is the normalized diameter of S (typically =nO(1)). Space: n · 2O(dim(S))

Dynamic maintenance of S: Insertion / deletion time: 2O(dim(S)) · log · loglog .

Additional properties: Best possible dependency on dim(S) (in a certain model). Oblivious to dim(S) and robust against “bad localities”.

Matches/improves known (more specialized) results.

Navigating Nets 12

NetsDefinition: An r-net of X is a subset Y with1. d(y1,y2) r for all y1,y2 2 Y.

2. d(x,Y) < r for all x 2 XnY.

(I.e., a maximal r-separated subset.)

Note: Compare vs. -net.

Running example – a path metric:

An 8-net

A 4-net

A 16-net

Navigating Nets 13

More netsDefinition: An r-net of X is a subset Y with1. d(y1,y2) r for all y1,y2 2 Y.

2. d(x,Y) < r for all x 2 XnY.

(I.e., a maximal r-separated subset.)

Note: Compare vs. -net.

Yr

Y Y

Y

Navigating Nets 14

The data structureFor every r = 2i, let Yr be an r-net of S. Only O(log ) values of r are non-trivial.

A 16-net

An 8-net

A 4-net

For every y 2 Yr maintain a navigation list

Ly,r = {z 2 Yr/2: d(y,z) 2r}

Navigating Nets 15

More on the data structure

3r

Yr/2

Yr

For every r = 2i, let Yr be an r-net of S. Only O(log ) values of r are non-trivial.

For every y 2 Yr maintain a navigation list

Ly,r = {z 2 Yr/2: d(y,z) 2r}

Navigating Nets 16

Space requirementLemma: |Ly,r| 2O(dim(S)) for all y2Y, r¸0.Proof:

Ly,r is contained in a ball of radius 2r.

This ball can be covered by S3 balls of radius r/4.

Every point in Ly,r Yr/2 must be covered by a distinct ball.

Hence, | Ly,r | S3 = 23dim(S).

Corollary: Total space is 2O(dim(S)) · n · log .We actually improve it to 2O(dim(S)) · n.

Navigating Nets 17

Back to running example

A 16-net

An 8-net

A 4-net

Navigating Nets 18

Navigating netsLet $ denote the query point.

Initially z16 = only point in Y16.

Find z8 = closest Y8 point to $.

Find z4 = closest Y4 point to $ etc.

$

$

$

Navigating Nets 19

How to find zr/2?

Assume each zr2Yr is the closest point to a (instead of to q).

Then d(zr,zr/2) · r+r/2 = 3r/2.

And zr/2 must be in zr‘s list Ly,r.

• q

• zr

· r

• a

• zr/2

· r/2 · r/4For zr to be closest Yr point to q,

It suffices that d(q,a) · r/4.

And then zr’s list Ly,r contains zr/2.

Note: d(q,zr) · 3r/2.

Navigating Nets 20

Stopping pointIf we find a point zr with d(q,zr) · 3r/2,

But not a point zr/2 with d(q,zr/2) · 3r/4,

We know that d(q,S) > r/4,

Yielding 6-NNS with query time 2O(dim(S)) · log .

This can be extended to (1+)-NNS Similar principles yield insertions and deletions.

Navigating Nets 21

Near-optimalityThe basic idea: Consider a uniform metric on points. Let the query point be at distance 1 from all of them, Except for one point whose distance is 1-. Finding this point requires (in an oracle model) computing all

distances to q.

Can happen at every distance scale r.

We get a lower bound of 2 (dim(S)) log .

Navigating Nets 22

Related work – general metricsLet KX be the smallest K such that

|B(x,r)| K ¢ |B(x,r/2)| for all x 2 X, r ¸ 0.

Define the KR-dimension as log2 KX.

Randomized exact NNS [Karger-Ruhl’02, Hildrum et al.’04]: Space n · 2O(dim(S)) · log . Query time : 2O(dim(S)) · log . If dimKR(S) = O(1) the log term is actually O(log n).

Our results extend to this setting:1. KR-metrics are doubling: dim(X) 4dimKR(X).

2. Our algorithms actually give exact NNS.

Assumptions on query distribution [Clarkson’99].

Navigating Nets 23

Related work – Euclidean metricsExact NNS for Rd: O(d5 log n) query time and O(nd+) space. [Meiser’93]

-NNS for Rd: O((d/)d log n) query time and O(dn) space by quad-tree like

decompositions [AMNSW’94]. Our algorithm achieves similar bounds.

O(d polylog(dn)) query time and (dn)O(1) space is useful for higher dimensions [IM’98, KOR’98].

Navigating Nets 24

Concluding remarksOur approach: A “decision tree” that is not really a tree (saves space).

In progress: A different (static) scheme where log is replaced by log n. Bounds on the help of “ambient” space points.

Our data structure yields a spanner of the metric Immediate: O(1) stretch with average degree 2dim(S). More work: O(1) stretch with maximum degree 2dim(S).

[Guibas,’04] applied the nets data structure for moving points in the plane.

Navigating Nets 25

navigating nets: simple algorithms for proximity search robert krauthgamer (ibm almaden) joint work...

Documents

metric x

metric space x

navigating nets10

navigating nets21

navigating nets20

doubling constant x

log n query time

missing piece slide