advances in metric embedding theory

Advances in Metric Advances in Metric Embedding TheoryEmbedding Theory

Ofer NeimanOfer Neiman

Ittai Abraham Yair BartalIttai Abraham Yair Bartal

Hebrew UniversityHebrew University

Talk OutlineTalk OutlineCurrent results:Current results: New method of embedding.New method of embedding. New partition techniques.New partition techniques. Constant average distortion.Constant average distortion. Extend notions of distortion.Extend notions of distortion. Optimal results for Optimal results for scaling scaling embeddings.embeddings. Tradeoff between Tradeoff between distortion and dimensiondistortion and dimension..

Work in progress:Work in progress: Low dimension embedding for Low dimension embedding for doublingdoubling metrics. metrics. Scaling distortion into a Scaling distortion into a single treesingle tree.. Nearest neighborsNearest neighbors preserving embedding. preserving embedding.

Embedding Metric SpacesEmbedding Metric Spaces Metric spaces Metric spaces (X,d(X,dXX), (Y,d), (Y,dyy))

EmbeddingEmbedding is a function is a function f:Xf:X→→YY For non-contracting Embedding For non-contracting Embedding ff,,

Given Given u,v u,v in in XX let let

Distortion cDistortion c if if maxmax{u,v {u,v X} X} dist distff(u,v) (u,v) ≤≤ c c

vud

vfufdvudist

X

Yf ,

,,

Low-Dimension EmbeddingsLow-Dimension Embeddings into Linto Lpp

For arbitrary metric space on n points:For arbitrary metric space on n points:

[Bourgain 85][Bourgain 85]:: distortion distortion O(log n)O(log n) [LLR 95][LLR 95]:: distortion distortion ΘΘ(log n)(log n) dimension dimension O(logO(log22 n) n) Can the dimension be reduced?Can the dimension be reduced? For For p=2p=2, yes using , yes using [JL][JL] to dimension to dimension O(log n)O(log n) TheoremTheorem: embedding into : embedding into LLpp with distortion with distortion O(log n),O(log n),

dimension dimension O(log n)O(log n) for any for any pp.. TheoremTheorem: distortion : distortion O(logO(log1+1+θθ n), n), dimension dimension

ΘΘ(log n/ ((log n/ (θθ loglog n)) loglog n))

Average Distortion EmbeddingsAverage Distortion Embeddings

In many practical uses, the quality of an embedding is In many practical uses, the quality of an embedding is measured by its measured by its average distortionaverage distortion Network embeddingNetwork embedding Multi-dimensional scalingMulti-dimensional scaling BiologyBiology VisionVision

TheoremTheorem: Every n point metric space can be embedded : Every n point metric space can be embedded into into LLpp with average distortion with average distortion O(1),O(1), worst-case distortion worst-case distortion O(log n)O(log n) and dimension and dimension O(log n).O(log n).

Xvuf vudist

nfavgdist

,

1

),(2

XvuX

XvuY

vud

vfufd

fdistavg

,

,

,

,

Variation on distortion: Variation on distortion: The LThe Lqq distortion of an embedding distortion of an embedding

Given a non-contracting embedding Given a non-contracting embedding

ff from from (X,d(X,dXX)) to to (Y,d(Y,dYY):):

Define it’s Define it’s LLqq-distortion-distortion

vud

vfufdvudist

X

Yf ,

,,

q

vu

qfqfq vudist

nvudistfdist

,

2,

1

vudistfdist f ,max

21

2 ,2

Xvuf vudist

nfdist

Xvuf vudist

nfdist ,

2

1

1 Thm: Lq-distortion is bounded by O(q)

Partial & Scaling DistortionPartial & Scaling Distortion

Definition: Definition: A A (1-(1-εε)-)-partial embedding has distortion partial embedding has distortion D(D(εε),), if if at least at least 1-1-εε of the pairs satisfy of the pairs satisfy dist(u,v)<D(dist(u,v)<D(εε).).

Definition:Definition: An embedding has scaling distortion An embedding has scaling distortion D(·)D(·) if it is if it is a a 1-1-εε partial embedding with distortion partial embedding with distortion D(D(εε),), for for all all εε>0>0 simultaneouslysimultaneously..

[KSW 04][KSW 04]: : Introduce the problem in context of network embeddings.Introduce the problem in context of network embeddings. Initial results.Initial results.

[A+ 05][A+ 05]:: Partial distortion andPartial distortion and dimension dimension O(log(1/O(log(1/εε)))) for all metrics. for all metrics. Scaling distortion Scaling distortion O(log(1/O(log(1/εε)))) for doubling metrics. for doubling metrics.

Thm:Thm: Scaling distortion Scaling distortion O(log(1/ε))O(log(1/ε)) for all metrics. for all metrics.

LLqq-Distortion Vs -Distortion Vs Scaling DistortionScaling Distortion

Upper boundUpper bound O(log 1/O(log 1/εε)) on on Scaling Scaling distortiondistortion implies: implies: LLqq--distortiondistortion = O(min{q,log n}). = O(min{q,log n}). Average distortion = Average distortion = O(1).O(1). Distortion = Distortion = O(log n).O(log n).

For any metric: For any metric: ½ of pairs distortion are ≤ ½ of pairs distortion are ≤ c log(2) = cc log(2) = c +¼ of+¼ of pairspairs distortion are ≤ distortion are ≤ c log(4)= 2cc log(4)= 2c +⅛ of+⅛ of pairspairs distortion are ≤ distortion are ≤ c log(8) = 3cc log(8) = 3c … ….. +1/n+1/n22 of of pairspairs distortion are ≤ 2distortion are ≤ 2c log(n)c log(n)

For For εε<1/n<1/n22, no pairs are ignored, no pairs are ignored..cciavgdist

i

i 220

Lower boundLower bound ΩΩ(log 1/(log 1/εε)) on on partial distortionpartial distortion implies: implies:

LLqq--distortiondistortion = = ΩΩ(min{q,log n}).(min{q,log n}).

Probabilistic Partitions Probabilistic Partitions

PP={={SS11,S,S22,…S,…Stt} is a partition of } is a partition of X X ifif

PP((xx)) is the cluster containing is the cluster containing xx.. P P is is ΔΔ-bounded-bounded if if diam(Sdiam(Sii)≤)≤ΔΔ for all for all ii..

A A probabilistic partitionprobabilistic partition PP is a distribution over a set is a distribution over a set of partitions. of partitions.

PP is is ηη-padded-padded if if

XSSSji ii

ji ,:

21,Pr xPxB

Let Let ΔΔii=4=4ii be the scales.be the scales.

For each scale For each scale ii, create a probabilistic , create a probabilistic ΔΔii--

boundeboundedd partitions partitions PPii,, that are that are ηη--paddedpadded..

For each cluster choose For each cluster choose σσii(S)~Ber(½)(S)~Ber(½) i.i.d. i.i.d.

ffii(x)= (x)= σσii(P(Pii(x))·d(x,X\P(x))·d(x,X\Pii(x))(x))

Repeat Repeat O(log n)O(log n) times. times. Distortion : Distortion : O(O(ηη-1-1·log·log1/p1/pΔΔ).). Dimension : Dimension : O(log n·log O(log n·log ΔΔ).).

Partitions and EmbeddingPartitions and Embedding

xfxf ii 0

diameter of X =diameter of X = Δ

Δi

48

x

d(x,X\P(x))

Upper BoundUpper Bound

ffii(x)= (x)= σσii(P(Pii(x))·d(x,X\P(x))·d(x,X\Pii(x))(x))

For all For all x,yx,yєєXX:: PPii(x)(x)≠≠PPii(y)(y) implies implies d(x,X\Pd(x,X\Pii(x))≤d(x,y) (x))≤d(x,y)

PPii(x)(x)==PPii(y)(y) impliesimplies d(x,A)-d(y,A)≤d(x,y) d(x,A)-d(y,A)≤d(x,y)

yxdyxdyfxf ppp

p

i

p

ii ,log,log 111log

0

i

x

y

Take a scale Take a scale i i such that such that ΔΔii≈≈d(x,y)/4.d(x,y)/4. It must be thatIt must be that P Pii(x)≠P(x)≠Pii(y)(y)

With probability ½ :With probability ½ : d(x,X\Pd(x,X\Pii(x))≥(x))≥ηΔηΔii

With probabilityWith probability ¼ : ¼ : σσii(P(Pii(x))=1 and (x))=1 and σσii(P(Pii(y))=0(y))=0

yxdyfxf iii ,0

Lower Lower Bound:Bound:

ηη-padded Partitions-padded Partitions

The parameter The parameter ηη determines the quality of the determines the quality of the embedding.embedding.

[Bartal 96]:[Bartal 96]: ηη==ΩΩ(1/log n)(1/log n) for any metric space. for any metric space. [Rao 99]:[Rao 99]: ηη==ΩΩ(1) (1) used to embed planar metrics into Lused to embed planar metrics into L22..

[CKR01+FRT03]:[CKR01+FRT03]: Improved partitions with Improved partitions with ηη(x)=log(x)=log-1-1((ρρ(x,(x,ΔΔ)).)). [KLMN 03]:[KLMN 03]: Used to embed general + doubling metrics into Used to embed general + doubling metrics into

LLpp : distortion : distortion O(O(ηη-(1-1/p)-(1-1/p)loglog1/p1/pn),n), dimension dimension O(logO(log22n)n)

The The local growth ratelocal growth rate of of xx at radius at radius rr is: is:

4,

4,,

rxB

rxBrx

Uniform Probabilistic Uniform Probabilistic PartitionsPartitions In a In a UniformUniform Probabilistic Partition Probabilistic Partition

ηη:X→[0,1]:X→[0,1] All points in a cluster have the same padding parameter.All points in a cluster have the same padding parameter. Uniform partition lemmaUniform partition lemma: There exists a uniform : There exists a uniform

probabilistic probabilistic ΔΔ-bounded partition such that for -bounded partition such that for any , any , ηη(x)=log(x)=log-1-1ρρ(v,(v,ΔΔ),), wherewhere

v1v2

v3

C1C2

η(C2)

η(C1)

,min xvCx

Cx

Let Let ΔΔii=4=4ii..

For each scale For each scale ii, create , create uniformly paddeduniformly padded probabilistic probabilistic ΔΔii--boundeboundedd partitions partitions PPii..

For each cluster choose For each cluster choose σσii(S)~Ber(½)(S)~Ber(½) i.i.d. i.i.d.

, , ffii(x)= (x)= σσii(P(Pii(x))·(x))·ηηii-1-1(x)·(x)·d(x,X\Pd(x,X\Pii(x))(x))

Upper boundUpper bound : : |f(x)-f(y)| |f(x)-f(y)| ≤≤ O(log n)·d(x,y). O(log n)·d(x,y). Lower boundLower bound: : E[|f(x)-f(y)|] E[|f(x)-f(y)|] ≥≥ ΩΩ(d(x,y))(d(x,y)) ReplicateReplicate D=Θ(log n)D=Θ(log n) times to get high probability. times to get high probability.

0i

i xfxf

EmbeddingEmbeddinginto one dimensioninto one dimension

Upper Bound:Upper Bound: |f(|f(xx)-f()-f(yy)| ≤ )| ≤ OO(log (log nn) d() d(xx,,yy))

For all For all x,yx,yєєXX:: - - PPii(x)(x)≠≠PPii(y)(y) implies implies ffii(x)≤ (x)≤ ηηii

-1-1(x)·(x)· d(x,y) d(x,y)

- P- Pii(x)(x)==PPii(y)(y) impliesimplies ffii(x)-(x)- ffii(y(y)≤ )≤ ηηii-1-1(x)·(x)· d(x,y) d(x,y)

yxdnO

xB

xByxd

xyxdyfxf

i i

i

ii

iii

,log

4,

4,log,

,

0

0

1

0

Use uniform padding in cluster

xPXxdxxPxf iiiii \,1

ii x

x

y

Take a scale Take a scale i i such that such that ΔΔii≈≈d(x,y)/4.d(x,y)/4. It must be thatIt must be that P Pii(x)≠P(x)≠Pii(y)(y)

With probability ½ With probability ½ : f: fii(x)= (x)= ηηii-1-1(x)d(x,X\P(x)d(x,X\Pii(x))≥(x))≥ΔΔii

Lower Lower Bound:Bound:

Lower bound : E[|f(x)-f(y)|] ≥ Lower bound : E[|f(x)-f(y)|] ≥ d(x,y)d(x,y)

Two cases:Two cases:

1.1. R < R < ΔΔii/2/2 then then prob. prob. ⅛: ⅛: σσii(P(Pii(x))=1 and (x))=1 and σσii(P(Pii(y))=0(y))=0 Then Then f fii(x) (x) ≥≥ ΔΔii , ,ffii(y)=0(y)=0 |f(x)-f(y)| |f(x)-f(y)| ≥≥ ΔΔii/2 =/2 =ΩΩ(d(x,y)).(d(x,y)).

2.2. R R ≥≥ ΔΔii/2/2 then then prob. prob. ¼: ¼: σσii(P(Pii(x))=0 and (x))=0 and σσii(P(Pii(y))=0(y))=0 ffii(x)=f(x)=fii(y)=0(y)=0 |f(x)-f(y)| |f(x)-f(y)| ≥≥ ΔΔii/2 =/2 =ΩΩ(d(x,y)).(d(x,y)).

ij

jj yfxfR

Coarse Scaling Embedding Coarse Scaling Embedding into Linto Lpp

Definition:Definition: For For uuєєX, X, rrεε(u)(u) is is

the minimal radius such the minimal radius such that that |B(u,r|B(u,rεε(u))| ≥(u))| ≥εεnn..

CoarseCoarse scaling scaling embedding: For each embedding: For each uuєєX,X, preserves distances preserves distances outsideoutside B(u,rB(u,rεε(u)).(u)).

urε(u)

vrε(v)

rε(w)w

Scaling DistortionScaling Distortion ClaimClaim: If : If d(x,y) > rd(x,y) > rεε(x)(x) then then 1 1 ≤≤ dist distff(x,y) (x,y) ≤≤ O(log 1/ O(log 1/εε)) Let Let ll be the scale be the scale d(x,y) d(x,y) ≤≤ ΔΔll < 4d(x,y) < 4d(x,y)

Lower boundLower bound: : E[|f(x)-f(y)|] E[|f(x)-f(y)|] ≥≥ d(x,y) d(x,y) Upper boundUpper bound for for high high diameter termsdiameter terms

Upper boundUpper bound for for lowlow diameter terms diameter terms

ReplicateReplicate D=Θ(log n)D=Θ(log n) times to get high probability. times to get high probability.

yxdOyfxfli

ii ,1log

yxdOyfxfli

ii ,1

Upper Bound for high diameter terms:Upper Bound for high diameter terms:|f(|f(xx)-f()-f(yy)| ≤ )| ≤ OO(log 1/ε) d((log 1/ε) d(xx,,yy))

Scale Scale ll such that such that rrεε(x)(x)≤≤d(x,y) d(x,y) ≤≤ ΔΔll < 4d(x,y). < 4d(x,y).

yxdO

xB

xByxd

xyxdyfxf

li i

i

lii

liii

,1log

4,

4,log,

, 1

nxrxB ,

xPXxdxxPxf iiiii \,1

Upper Bound for low diameter terms:Upper Bound for low diameter terms:|f(u)-f(v)| |f(u)-f(v)| == O(1)O(1) d(u,v) d(u,v)

Scale Scale ll such that such that d(x,y) d(x,y) ≤≤ ΔΔll < 4d(x,y). < 4d(x,y).

All lower levels All lower levels i i ≤≤ l l are bounded by are bounded by ΔΔii..

yxdOyfxf lli

ili

ii ,1

yxdOOyfxfi

ii ,11log0

xPXxdxxPxf iiiii \,1 iiiiii xPXxdxxPxf ,\,min 1

Embedding into LEmbedding into Lpp

Partition Partition PP is is ((ηη,,δδ)-)-paddedpadded if if

Lemma:Lemma: there exists ( there exists (ηη,,δδ)-)-padded partitions with padded partitions with ηη(x)=log(x)=log-1-1((ρρ(v,(v,ΔΔ))·log(1/))·log(1/δδ),), where where v=minv=minuuєєP(x)P(x){{ρρ(u,(u,ΔΔ)}.)}.

Hierarchical partitionHierarchical partition : every cluster in level : every cluster in level ii is a is a refinement of cluster in level refinement of cluster in level i+1i+1..

TheoremTheorem: Every : Every nn point metric space can be embedded point metric space can be embedded into into LLpp with dimension with dimension O(eO(ep p log nlog n). For every ). For every qq::

xPxB ,Pr

logdist min ,q

q nf O

p p

Embedding into LEmbedding into Lpp

Embedding into Embedding into LLpp with with scaling distortion:scaling distortion:

Use partitions with small probability of Use partitions with small probability of padding : padding : δδ=e=e-p-p..

HierarchicalHierarchical Uniform Partitions. Uniform Partitions. Combination with Matousek’s sampling Combination with Matousek’s sampling

techniques.techniques.

Low Dimension EmbeddingsLow Dimension Embeddings

Embedding with distortion Embedding with distortion O(logO(log1+1+θθ n), n), dimension dimension ΘΘ(log n/ ((log n/ (θθ loglog n)). loglog n)).

Optimal trade-off between distortion and Optimal trade-off between distortion and dimension.dimension.

Use partitions with high probability of padding : Use partitions with high probability of padding : δδ=1-log=1-log--θθn.n.

Additional Results: Additional Results: Weighted AveragesWeighted Averages Embedding with weighted average distortion Embedding with weighted average distortion

O(log O(log ΨΨ) for weights with aspect ratio ) for weights with aspect ratio ΨΨ

Algorithmic applications: Algorithmic applications: Sparsest cut, Sparsest cut, Uncapacitated quadratic assignment, Uncapacitated quadratic assignment, Multiple sequence alignment.Multiple sequence alignment.

Low Dimension EmbeddingsLow Dimension EmbeddingsDoubling MetricsDoubling Metrics

Definition:Definition: A metric space has A metric space has doubling constant doubling constant λλ, , if any ball if any ball with radius with radius r>0r>0 can be covered with can be covered with λλ balls of half the radius. balls of half the radius.

Doubling dimensionDoubling dimension = = log log λλ..

[GKL03]:[GKL03]: Embedding doubling metrics, with tight distortion.Embedding doubling metrics, with tight distortion.

Thm:Thm: Embedding arbitrary metrics into Embedding arbitrary metrics into LLpp with distortion with distortion O(logO(log1+1+θθ n),n), dimension dimension O(log O(log λλ).). Same embedding,Same embedding, with similar techniques. with similar techniques.

Use Use nets.nets. Use Use Lovász Local LemmaLovász Local Lemma..

Thm: Thm: Embedding arbitrary metrics intoEmbedding arbitrary metrics into L Lpp with distortion with distortion O(logO(log1-1/p1-1/pλλ·log·log1/p1/p n), n), dimension dimension Õ(log n·logÕ(log n·logλλ).). Use hierarchical partitions as well.Use hierarchical partitions as well.

Scaling Distortion into treesScaling Distortion into trees

[A+ 05][A+ 05]:: ProbabilisticProbabilistic Embedding intoEmbedding into a distribution of a distribution of ultrametrics with scaling distortion ultrametrics with scaling distortion O(log(1/O(log(1/εε)).)).

Thm:Thm: Embedding into an ultrametric with scaling Embedding into an ultrametric with scaling distortion .distortion .

Thm:Thm: Every graph contains a Every graph contains a spanning treespanning tree with with scaling distortion .scaling distortion .

Imply :Imply : Average distortionAverage distortion = O(1). = O(1). LL22-distortion-distortion = =

Can be viewed as a network design objective.Can be viewed as a network design objective.

Thm:Thm: Probabilistic Probabilistic Embedding intoEmbedding into a distribution of a distribution of spanning treesspanning trees with scaling distortion with scaling distortion Õ(logÕ(log22(1/(1/εε)).)).

1O

1O

nO log

New Results:New Results:Nearest-Neighbors Preserving Nearest-Neighbors Preserving EmbeddingsEmbeddings

Definition: Definition: x,yx,y are are kk-nearest neighbors-nearest neighbors if if |B(x,d(x,y))||B(x,d(x,y))|≤≤k.k.

Thm: Thm: Embedding into Embedding into LLpp with distortion with distortion Õ(log k)Õ(log k) on on

k-nearest neighbors, k-nearest neighbors, for all for all kk simultaneously simultaneously, and , and dimension dimension O(log n).O(log n).

Thm:Thm: For fixed For fixed kk, embedding into , embedding into LLpp distortion distortion O(log kO(log k) )

and and dimension dimension O(log k).O(log k). Practically the Practically the same embedding.same embedding. Every level is Every level is scaled downscaled down, higher levels more aggressively., higher levels more aggressively. Lovász Local Lemma.Lovász Local Lemma.

Nearest-Neighbors Preserving Nearest-Neighbors Preserving EmbeddingsEmbeddings

Thm: Thm: Probabilistic embedding into a distribution of Probabilistic embedding into a distribution of ultrametrics with distortion ultrametrics with distortion Õ(log k)Õ(log k) for for all all k-nearest k-nearest neighborsneighbors..

Thm: Thm: Embedding into an ultrametric with distortion Embedding into an ultrametric with distortion k-1k-1 for for all all k-nearest neighbors.k-nearest neighbors.

Applications :Applications : Sparsest-cut with “neighboring” demand pairs.Sparsest-cut with “neighboring” demand pairs. Approximate ranking / Approximate ranking / kk-nearest neighbors search.-nearest neighbors search.

ConclusionsConclusions

Unified frameworkUnified framework for embedding arbitrary metrics. for embedding arbitrary metrics. New New measuresmeasures of distortion. of distortion. Embeddings with improved properties:Embeddings with improved properties:

Optimal Optimal scaling scaling distortion.distortion. Constant average distortion.Constant average distortion. Tight Tight distortion-dimensiondistortion-dimension tradeoff. tradeoff.

Embedding metrics in their doubling dimension.Embedding metrics in their doubling dimension. Nearest-neighbors preserving embedding.Nearest-neighbors preserving embedding. Constant average distortion Constant average distortion spanning treesspanning trees..

advances in metric embedding theory

Documents