voronoi diagrams in information geometry: statistical voronoi diagrams and their applications

60

Upload: frank-nielsen

Post on 05-Dec-2014

240 views

Category:

Travel


2 download

DESCRIPTION

Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

TRANSCRIPT

Page 1: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Voronoi diagrams in information geometry

Statisti al Voronoi diagrams and their appli ations

Frank NIELSEN

É ole Polyte hnique

Sony CSL

e-mail: Frank.Nielsena m.org

MaxEnt 2014

© 2014 Frank NIELSEN 5793b870 1/57

Page 2: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Outline of the tutorial

1. Eu lidean Voronoi diagrams

2. Dis riminant analysis, Mahalanobis Voronoi diagrams, and

anisotropi Voronoi diagrams

3. Fisher-Hotelling-Rao Voronoi diagrams (Riemannian

urved geometries)

4. Kullba k-Leibler Voronoi diagrams (Bregman Voronoi

diagrams, dually at geometries)

5. Bayes' error, Cherno information and statisti al Voronoi

diagrams

© 2014 Frank NIELSEN 5793b870 2/57

Page 3: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Eu lidean (ordinary) Voronoi diagrams

P = P1

, ...,Pn : n distin t point generators in Eu lidean spa e Ed

V (Pi ) = X : DE (Pi ,X ) ≤ DE (Pj ,X ), ∀j 6= i = ∩ni=1

Bi

+(Pi ,Pj )

DE (P ,Q) = ‖θ(P)− θ(Q)‖2

=√∑d

i=1

(θi (P)− θi (Q))2

θ(P) = p : Cartesian oordinate system with θj(Pi ) = p(j)i .

Bise tors Bi(P ,Q) = X : DE (P ,X ) ≤ DE (Q,X ) : hyperplanes

Voronoi diagram = ell omplex V (Pi )'s with their fa es

⇒ Many appli ations : rystal growth, odebook/quantization,

mole ule interfa es/do king, motion planning, et .

© 2014 Frank NIELSEN 5793b870 3/57

Page 4: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Voronoi diagrams and dual Delaunay simpli ial omplex

⇒ Empty sphere property, max min angle triangulation, et

⇒ General position : no (d + 2) points ospheri al⇒ Voronoi & dual Delaunay triangulation

⇒ Bise tor Bi(P ,Q) perpendi ular ⊥ to segment [PQ]

© 2014 Frank NIELSEN 5793b870 4/57

Page 5: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Minimum spanning tree ⊂ Delaunay triangulation

All edges of the Eu lidean MST are Delaunay edges

→ Prim's greedy algorithm in Delaunay graph in O(n log n)

Convex hull : boundary ∂ of the Delaunay triangulation

© 2014 Frank NIELSEN 5793b870 5/57

Page 6: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Order k Voronoi diagrams

All subsets of size k , Pk =(Pk

)= K

1

, ...,KN with N =(nk

).

partition the spa e into non-empty k-order Voronoi ells :

Vork(Ki ) = x : ∀q ∈ Ki ,∀r ∈ P\Ki , D(x , q) ≤ D(x , r)

Combinatorial omplexity not yet settled ! ! ! (k-sets/levels)

≡ proje tion of k-levels of arrangements of hyperplanes in Rd+1

.

© 2014 Frank NIELSEN 5793b870 6/57

Page 7: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Farthest order Voronoi and Minimum En losing Ball

Cir um enter/minmax enter on the farthest Voronoi diagram.

Non-dierentiable optim. at the farthest Voronoi [16

© 2014 Frank NIELSEN 5793b870 7/57

Page 8: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Voronoi & Delaunay : Complexity and algorithms

Complexity : Θ(n⌈

d2

⌉) (quadrati in 3D)

Moment urve : t 7→ (t, t2, .., td ), et .

Constru tion : Θ(n log n+ n⌈

d2

⌉), output-sensitive algorithms

Ω(n log n + f ), not yet optimal output-sensitive

algorithms.

Voronoi diagram : VorD(P) = Vorf (D(·,·))(P) for a stri tly

monotoni ally in reasing fun tion f (·)Vor. Eu lidean ≡ Vor. squared Eu lidean ⊂ Vor. Bregman

Geometri toolbox : spa e of spheres (polarity, orthogonality

between spheres, et .)

© 2014 Frank NIELSEN 5793b870 8/57

Page 9: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Multiple Hypothesis Testing [9, 10

Given a random variable X with n hypothesis

H1

: X ∼ P1

: ...

Hn : X ∼ Pn

de ide for a IID sample x1

, ..., xm ∼ X whi h hypothesis holds true ?

Pm orre t

= 1− Pmerror

Asymptoti regime : Error exponent α

lim

m→∞−

1

mlogPm

e = α

© 2014 Frank NIELSEN 5793b870 9/57

Page 10: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Bayesian hypothesis testing

Prior probabilities : wi = P(X ∼ Pi) > 0 (with

∑ni=1

wi = 1)

Conditional probabilities : P(X = x |X ∼ Pi).

P(X = x) =n∑

i=1

P(X ∼ Pi )P(X = x |X ∼ Pi ) =n∑

i=1

wiP(X |Pi)

Cost design matrix [cij ]= with ci ,j = ost of de iding Hi when

in fa t Hj is true

pi ,j(u(x)) = probability of making this de ision using rule u.

© 2014 Frank NIELSEN 5793b870 10/57

Page 11: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Bayesian dete tor : MAP rule

Minimize the expe ted ost :

EX [c(r(x))], c(r(x)) =∑

i

wi

j 6=i

ci ,jpi ,j(r(x))

Spe ial ase : Probability of error Pe : ci ,i = 0 and ci ,j = 1 for

i 6= j :

Pe = EX

i

wi

j 6=i

pi ,j(r(x))

Maximum a posteriori probability (MAP) rule :

map(x) = argmaxi∈1,...,n wipi (x)

where pi (x) = P(X = x |X ∼ Pi ) are the onditional

probabilities.

→ MAP Bayesian dete tor minimizes Pe over all rules [7

© 2014 Frank NIELSEN 5793b870 11/57

Page 12: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

MVNs, MAP rule, and Mahalanobis distan e (1930)

MultiVariate Normals

(MVNs)

Say, X

1

∼ N(µ1

,Σ) and X2

∼ N(µ2

,Σ)

Probability of error (mis lassi ation) for w

1

= w2

= 1

2

:

Pe =1

2

(1− TV(P1

,P2

)) = Φ

(

−1

2

∆(P1

,P2

)

)

Φ(·) : standard normal distribution fun tion

Mahalanobis metri distan e ∆(·, ·) :

∆2(X1

,X2

) = (µ1

− µ2

)⊤Σ−1(µ1

− µ2

) = ∆µ⊤Σ−1∆µ

Measure the density overlapping (the greater, the lesser)

generalize Eu lidean distan e (Σ = I )

∆2

: only symmetri Bregman divergen e.

⇒ easy to approximate Pe experimentally

© 2014 Frank NIELSEN 5793b870 12/57

Page 13: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Mahalanobis Voronoi diagrams

MHT for isotropi MVNs with uniform weight wi =1

n

Cholesky de omposition Σ = LL⊤

∆2(X1

,X2

) = (µ1

− µ2

)⊤Σ−1(µ1

− µ2

),

= (µ1

− µ2

)⊤(L−1)⊤L−1(µ1

− µ1

),

= (L−1µ1

− L−1µ2

)⊤(L−1µ1

− L−1µ2

)

∆2(X1

,X2

) = DE (L−1µ

1

, L−1µ2

)

Mahalanobis Voronoi diagram :

Cholesky de omposition : Σ = LL⊤

Map P = p

1

, ..., pn to P ′ = p′1

, ..., p′n with p′i = L−1x .

Ordinary Voronoi diagram VorE (P

′)

Map VorE (P

′) to VorΣ(P) = LVorE (P′).

© 2014 Frank NIELSEN 5793b870 13/57

Page 14: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Mahalanobis Voronoi diagrams

Σ a ount for both orrelation and dimension (feature) s aling

Dual stru ture ≡ anisotropi Delaunay triangulation

⇒ empty ir umellipse property

Re all : MAP rule from Voronoi diagram

© 2014 Frank NIELSEN 5793b870 14/57

Page 15: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Anisotropi Voronoi diagrams [6 (MHT for MVNs)

Classi ation : Generator Xi ∼ N(µi ,Σi), wi =1

n.

Dis riminant fun tions are quadrati bise tors

Orphans, islands, dual not a triangulation

(need wedged/visibility onditions)

Appli ations : Crystal growth in elds (not Riemannian smooth

metri )

© 2014 Frank NIELSEN 5793b870 15/57

Page 16: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Statisti s : Estimators

Given x1

, ..., xn IID observations (from a population), estimate the

underlying distribution X ∼ p.

empiri al CDF : Fn(x) =

1

n

i I(−∞,x)(xi ).

Glivenko-Cantelli theorem : supx |Fn(x) − F (x)| → 0 a.s.

Fisher approa h : Density p belongs to a parametri family

p(x |θ), D = #parameters (order)

Method of moments : Mat h distribution moments with

sample moments :

EX [Xl ] =

1

n

n∑

i=1

x li ,

⇒ any D independent equations yields an estimator

Fisher Maximum (log-)Likelihood Estimator MLE :

max

θ

l(θ; x1

, ..., xn) = max

θ

n∏

i=1

p(xi ; θ)

© 2014 Frank NIELSEN 5793b870 16/57

Page 17: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Estimation : Varian e Lower Bound

Whi h estimator θn shall we hoose ?

Estimator is also a random variable on a random ve tor

Consistent : limn→∞ θn → θ

Unbiased : Eθ[θn]− θ = 0

Minimum square error (MSE) : E[(θ − θ)2] = B[θ]2 + V[θ]

Fisher information matrix :

I (θ) =

[

Ii ,j(θ) = Eθ

[∂

∂θilog p(x |θ)

∂θjlog p(x |θ)

]]

Fré het Darmois Cramér-Rao lower bound for unbiased

estimators :

V[θn] 1

nI−1(θ)

E ient : estimator rea hes the Cramèr-Rao lower bound

© 2014 Frank NIELSEN 5793b870 17/57

Page 18: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Su ient statisti s and exponential families

Su ient statisti t(x) :

P(x |t(x) = t, θ) = P(x |t(x) = t)

All information for inferen e ontained in t(x). ( 6= an illary)

For univariate Gaussians, D = 2 : t

1

(x) = x and t2

(x) = x2.

µ = x =1

n

i

xi , σ2 =1

n

i

(xi − x)2 =

(

1

n

i

x2i

)

− (x)2

Exponential families have nite dimensional su ient

statisti s (Koopman) : Redu e n data to D statisti s.

∀x ∈ X , P(x |θ) = exp(θ⊤t(x)− F (θ) + k(x))

F (·) : log-normalizer/ umulant/partition fun tion

k(x) : auxiliary term for arrier measure

© 2014 Frank NIELSEN 5793b870 18/57

Page 19: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Exponential families : Fisher information and MLE

Common distributions are exponential families :

Poisson, Gaussians, Gamma, Beta, Diri hlet, et .

Fisher information :

I (θ) = ∇2F (θ)

MLE for an exponential family :

∇F (θ) =1

n

i

t(Xi) = η

MLE exists i.

η = ∇F (θ) ∈ int(CH(X ))

© 2014 Frank NIELSEN 5793b870 19/57

Page 20: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Population spa e : Hotelling (1930) [5 & Rao (1945) [25

Birth of dierential-geometri methods in statisti s.

Fisher information matrix (positive denite) an be used as a

(smooth) Riemannian metri tensor.

Distan e between two populations indexed by θ

1

and θ2

:

Riemannian distan e (metri length)

Fisher-Hotelling-Rao (FHR) geodesi distan e used in

lassi ation : Find the losest population to a given

population

Used in tests of signi an e (null versus alternative

hypothesis), power of a test : P(reje t H0

|H0

is false )(surfa es in population spa es)

© 2014 Frank NIELSEN 5793b870 20/57

Page 21: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Rao's distan e (1945, introdu ed by Hotelling 1930 [5)

Innitesimal length element :

ds2 =∑

ij

gij (θ)dθidθj = dθT I (θ)dθ

Geodesi and distan e are hard to expli itly al ulate :

ρ(p(x ; θ1

), p(x ; θ2

)) = min

θ(s)θ(0)=θ

1

θ(1)=θ2

∫1

0

√(dθ

ds

)T

I (θ)dθ

dsds

Metri property of ρ, many tools [1 : Riemannian Log/Exp

tangent/manifold mapping

© 2014 Frank NIELSEN 5793b870 21/57

Page 22: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Fisher Rao Hotelling Voronoi : Riemannian Voronoi diagrams

Lo ation-s ale 2D families have non-positive urvature

(Hotelling, 1930) : FHR Voronoi diagrams amount to

hyperboli Voronoi diagrams or Eu lidean diagrams

(lo ation families only like isotropi Gaussians)

Multinomial family has spheri al geometry on the positive

orthant : Spheri al Voronoi diagram (stereographi

proje tion ∝ Eu lidean Voronoi diagrams)

Arbitrary families p(x |θ) : Geodesi s not in losed forms →limited omputational framework in pra ti e...

© 2014 Frank NIELSEN 5793b870 22/57

Page 23: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Hyperboli Voronoi diagrams [19, 22

In Klein disk, the hyperboli Voronoi diagram amounts to a

lipped ane Voronoi diagram, or a lipped power

diagram. E ient lipping algorithm [2.

Convert to other models of hyperboli geometry : Poin aré

disk, upper half spa e, hyperboloid, Beltrami hemisphere, et .

Conformal (good for vizualizing) versus non- onformal

(good for omputing) models. ( onformal metri

G (x) = λ(x)I is s aled identity metri )

© 2014 Frank NIELSEN 5793b870 23/57

Page 24: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Hyperboli Voronoi diagrams [19, 22

Hyperboli Voronoi diagram in Klein disk :

© 2014 Frank NIELSEN 5793b870 24/57

Page 25: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Hyperboli Voronoi diagrams [19, 22

Ane Voronoi diagram equivalent to power diagram.

Power distan e : ‖x − p‖2 − wp (additively weighted ordinary

Voronoi)

© 2014 Frank NIELSEN 5793b870 25/57

Page 26: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Hyperboli Voronoi diagrams [19, 22

5 ommon models of the abstra t hyperboli geometry

https://www.youtube. om/wat h?v=i9IUzNxeH4o

ACM Symposium on Computational Geometry (SoCG'14)

© 2014 Frank NIELSEN 5793b870 26/57

Page 27: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

k-order hyperboli Voronoi diagram

k = 1 k = 2

Ane diagram ⇒ equivalent to a power diagram [21

Wat h k-order Voronoi :

https://www.youtube. om/wat h?v=i9IUzNxeH4o

© 2014 Frank NIELSEN 5793b870 27/57

Page 28: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Constrast fun tion and dually at spa e

Convex and stri tly dierentiable fun tion F (θ) admits a

Legendre-Fen hel onvex onjugate F ∗(η) :

F ∗(η) = sup

θ

(θ⊤η − F (θ)), ∇F (θ) = η = (∇F ∗)−1(θ)

Young's inequality gives rise to anoni al divergen e [8 :

F (θ)+F ∗(η′) ≥ θ⊤η′ ⇒ AF ,F∗(θ, η′) = F (θ) + F ∗(η′)− θ⊤η′

Writing using single oordinate system, get dual Bregman

divergen es :

BF (θp : θq) = F (θp)− F (θq)− (θp − θq)⊤∇F (θq)

= BF∗(ηq : ηp)

= AF ,F∗(θp, ηq) = AF∗,F (ηq : θp)

© 2014 Frank NIELSEN 5793b870 28/57

Page 29: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Dis riminant analysis exponential families

Consider X

1

∼ EF (θ1), X2

∼ EF (θ2) (or moment

parameterization ηi = ∇F (θi))

Bayes' rule : Classify x from X

1

i. P(x |θ1

) > P(x |θ2

)

Use bije tion between exponential families and Bregman

divergen es :

log p(x |θ) = −BF∗(t(x) : η) + F ∗(t(x)) + k(x)

MAP rule (w

1

= w2

= 1

2

) :

BF∗(t(x) : η1

) < BF∗(t(x) : η2

)

Bregman bise tor :

BiF∗(η1

, η2

) = η |BF∗(η : η1

) = BF∗(η : η1

)

© 2014 Frank NIELSEN 5793b870 29/57

Page 30: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Bregman dual bise tors [3, 17, 20

Right-sided bise tor : → Hyperplane (θ-hyperplane)

HF (p, q) = x ∈ X | BF (x : p) = BF (x : q).

HF :

〈∇F (p)−∇F (q), x〉 + (F (p)− F (q) + 〈q,∇F (q)〉 − 〈p,∇F (p)〉) = 0

Left-sided bise tor : → θ-Hypersurfa e (η-hyperplane)

H ′F (p, q) = x ∈ X | BF (p : x) = BF (q : x)

H ′F : 〈∇F (x), q − p〉+ F (p)− F (q) = 0

© 2014 Frank NIELSEN 5793b870 30/57

Page 31: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Visualizing Bregman bise tors

Primal oordinates θ Dual oordinates η

natural parameters expe tation parameters

p

qSource Space: Itakura-Saito

p(0.52977081,0.72041688) q(0.85824458,0.29083834)

D(p,q)=0.66969016 D(q,p)=0.44835617

p’

q’

Gradient Space: Itakura-Saito dual

p’(-1.88760873,-1.38808518) q’(-1.16516903,-3.43833618)

D*(p’,q’)=0.44835617 D*(q’,p’)=0.66969016

© 2014 Frank NIELSEN 5793b870 31/57

Page 32: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Spa e of Bregman spheres and Bregman balls [3

Dual Bregman balls (bounding Bregman spheres) :

Ball

rF (c , r) = x ∈ X | BF (x : c) ≤ r

and Ball

lF (c , r) = x ∈ X | BF (c : x) ≤ r

Legendre duality :

Ball

lF (c , r) = (∇F )−1(BallrF∗(∇F (c), r))

Illustration for Itakura-Saito divergen e, F (x) = − log x

© 2014 Frank NIELSEN 5793b870 32/57

Page 33: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Lifting/Polarity : Potential fun tion graph F

© 2014 Frank NIELSEN 5793b870 33/57

Page 34: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Spa e of Bregman spheres : Lifting map [3

F : x 7→ x = (x ,F (x)), hypersurfa e in Rd+1

.

Hp : Tangent hyperplane at p, z = Hp(x) = 〈x − p,∇F (p)〉+ F (p)

Bregman sphere σ −→ σ with supporting hyperplane

Hσ : z = 〈x − c ,∇F (c)〉 + F (c) + r .

(// to Hc and shifted verti ally by r)

σ = F ∩ Hσ.

interse tion of any hyperplane H with F proje ts onto X as a

Bregman sphere :

H : z = 〈x , a〉+b → σ : BallF (c = (∇F )−1(a), r = 〈a, c〉−F (c)+b)

© 2014 Frank NIELSEN 5793b870 34/57

Page 35: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Lifting with Itakura-Saito potential fun tion

© 2014 Frank NIELSEN 5793b870 35/57

Page 36: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Spa e of Bregman spheres [3

Union/interse tion of Bregman spheres from representational

polytope [3

Radi al axis of two Bregman balls is an hyperplane :

Appli ations to Nearest Neighbor sear h trees like Bregman

ball trees or Bregman vantage point trees [23.

© 2014 Frank NIELSEN 5793b870 36/57

Page 37: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Spa e of spheres : Minimum en losing ball [14, 24

To a hyperplane Hσ = H(a, b) : z = 〈a, x〉+b in Rd+1

, orresponds

a ball σ = Ball(c , r) in Rdwith enter c = ∇F ∗(a) and radius :

r = 〈a, c〉 − F (c)+ b = 〈a,∇F ∗(a)〉 − F (∇F ∗(a)) + b = F ∗(a)+ b

sin e F (∇F ∗(a)) = 〈∇F ∗(a), a〉 − F ∗(a) (Young equality)

SEB : Find halfspa e H(a, b)− : z ≤ 〈a, x〉+ b that ontains all

lifted points :

min

a,br = F ∗(a) + b,

∀i ∈ 1, ..., n, 〈a, xi 〉+ b − F (xi ) ≥ 0.

⇒ Convex Program (CP) with linear inequality onstraints

⇒ F (θ) = F ∗(η) = 1

2

x⊤x : CP → Quadrati Programming

(QP) [4.

© 2014 Frank NIELSEN 5793b870 37/57

Page 38: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

InSphere predi ates wrt Bregman divergen es [3

Impli it representation of Bregman spheres/balls :

Is x inside the Bregman ball dened by d + 1 support

points ?

InSphere(x ; p0

, ..., pd ) =

∣∣∣∣∣∣

1 ... 1 1

p0

... pd x

F (p0

) ... F (pd ) F (x)

∣∣∣∣∣∣

InSphere(x ; p

0

, ..., pd ) is negative, null or positive dependingon whether x lies inside, on, or outside σ.

© 2014 Frank NIELSEN 5793b870 38/57

Page 39: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Bregman Voronoi diagrams as minimization diagrams [3

A sub lass of ane diagrams whi h have all non-empty ells .

Minimization diagram of the n fun tions

Di(x) = BF (x : pi) = F (x)− F (pi )− 〈x − pi ,∇F (pi )〉.≡ minimization of n linear fun tions :

Hi (x) = (pi − x)T∇F (qi )− F (pi )

⇐⇒Wat h https://www.youtube. om/wat h?v=L7v1wuN_9Wg

© 2014 Frank NIELSEN 5793b870 39/57

Page 40: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Bregman Voronoi from Power diagrams [15, 20

Any ane diagram an be built from a power diagram.

(power diagrams dened in full spa e Rd)

Power distan e of x to Ball(p, r) : ||p − x ||2 − r2.

Laguerre diagram : minimization diagram of

Di(x) = ||pi − x ||2 − r2i

Power bise tor of Ball(pi , ri ) and Ball(pj , rj )= radi al

hyperplane :

2〈x , pj − pi〉+ ||pi ||2 − ||pj ||

2 + r2j − r2i = 0.

Universality : Ane Bregman Voronoi diagram ⇔ Power diagram

© 2014 Frank NIELSEN 5793b870 40/57

Page 41: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Ane Bregman Voronoi diagrams as power diagrams

Equivalen e : B(∇F (pi ), ri ) withr2i = 〈∇F (pi ),∇F (pi )〉+ 2(F (pi )− 〈pi ,∇F (pi)〉(imaginary radii shown in red, WLOG. ri ≥ 0 by shifting)

Some ells may be empty in the Laguerre diagram (power diagram)

but not in the Bregman diagram

http://www. sl.sony. o.jp/person/nielsen/BVDapplet/

© 2014 Frank NIELSEN 5793b870 41/57

Page 42: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Bregman dual Delaunay triangulations

Delaunay Exponential Hellinger-like

empty Bregman sphere property,

geodesi triangles.

BVDs extends Eu lidean Voronoi diagrams with similar omplexity/algorithms.

© 2014 Frank NIELSEN 5793b870 42/57

Page 43: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Riemannian tensor and orthogonality

Dually at spa e.

F (θ) onvex on Θ yields onvex onjugate F ∗(η) on H

Same Riemannian tensor an be expressed in both oordinate

systems :

g(θ) = ∇2F (θ), g∗(η) = ∇2F (η), g(θ(X ))g∗(η(X )) = I ,∀X

Same innitesimal length element ds2 = (ds∗)2.

Two urves γ

1

(t) and γ2

(t) are orthogonal atQ = γ

1

(t0

) = γ2

(t0

) i.

〈γ1

(t0

), γ2

(t0

)〉g(θ(Q)) = 〈γ1

(t0

), γ2

(t0

)〉∗g∗(η(Q)) = 0

Geodesi γ(P ,Q) and dual geodesi γ∗(P ,Q) (and geodesi

segments γ[P ,Q] and γ∗[P ,Q])

© 2014 Frank NIELSEN 5793b870 43/57

Page 44: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Orthogonality

3-point property (generalized law of osines) :

BF (p : r) = BF (p : q) + BF (q : r)− (p − q)T (∇F (r)−∇F (q)))

P

Q

R

γ[P,Q]

γ∗[Q,R]

γ(P ,Q) orthogonal to γ∗(Q,R) i.

BF (p : r) = BF (p : q) + BF (q : r)

Equivalent to 〈θp − θq, ηr − ηq〉 = 0

Extend Pythagoras' theorem

γ(P ,Q) ⊥ γ∗(Q,R)

© 2014 Frank NIELSEN 5793b870 44/57

Page 45: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Dually orthogonal Bregman Voronoi & Triangulations

Ordinary Voronoi diagram is perpendi ular to Delaunay

triangulation. (Vor k-fa e ⊥ Del d − k-fa e)

Dual line segment geodesi s :

γ(P ,Q) = θ = θp + (1− λ)θq |λ ∈ [0, 1]

γ∗(P ,Q) = η = ηp + (1− λ)ηq |λ ∈ [0, 1]

Bise tors :

Biθ(p, q) : 〈x , θq − θp〉+ F (θp)− F (θq) = 0

Biη(p, q) : 〈x , ηq − ηp〉+ F ∗(ηp)− F ∗(ηq) = 0

Dual orthogonality :

Biη(p, q) ⊥ γ∗(P ,Q)

γ(P ,Q) ⊥ Biθ(p, q)

© 2014 Frank NIELSEN 5793b870 45/57

Page 46: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Dually orthogonal Bregman Voronoi & Triangulations

Biη(p, q) ⊥ γ∗(P ,Q)

γ(P ,Q) ⊥ Biθ(p, q)

© 2014 Frank NIELSEN 5793b870 46/57

Page 47: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

MAP rule and additive Bregman Voronoi diagrams

Optimal MAP de ision rule :

map(x) = argmaxi∈1,...,nwipi(x)

= argmaxi∈1,...,n − B∗(t(x) : ηi ) + logwi ,

= argmini∈1,...,nB∗(t(x) : ηi)− logwi

⇒dual Bregman Voronoi with additive weights (ane

diagram).

Verti al proje tion of the interse tions of the lifted half-spa es on

the potential F∗shifted by the weights − logwi .

© 2014 Frank NIELSEN 5793b870 47/57

Page 48: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Relative entropy for exponential families [18

Kullba k-Leibler divergen e ( ross-entropy minus entropy) :

KL(P : Q) =

p(x) logp(x)

q(x)dx ≥ 0

=

p(x) log1

q(x)dx

︸ ︷︷ ︸

H×(P:Q)+c

p(x) log1

p(x)dx

︸ ︷︷ ︸

H(p)=H×(P:P)−c

= F (θQ)− F (θP)− 〈θQ − θP ,∇F (θP)〉

= BF (θQ : θP) = BF∗(ηP : ηQ)

Bregman divergen e BF dened for a stri tly onvex and

dierentiable fun tion up to some ane terms.

Proof KL(P : Q) = BF (θQ : θP) follows from

X ∼ EF (θ) =⇒ E [t(X )] = ∇F (θ) = η

© 2014 Frank NIELSEN 5793b870 48/57

Page 49: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Upper bounds Pe using Cherno Information [11

Tri k : min(a, b) ≤ aαb1−α,∀α ∈ (0, 1) (for a, b > 0)

E ∗ =

min(P(C1

|x),P(C2

|x))p(x)dx ≤ wα

1

w1−α

2

pα1

(x)p1−α

2

(x)dx

Upper bound the minimum error E ∗

E ∗ ≤ wα1

w1−α2

cα(p1 : p2),

cα(p1 : p2) =∫pα1

(x)p1−α2

(x)dx : Cherno α- oe ient.

c∗(p1

: p2

) = cα∗(p1

: p2

) = min

α∈(0,1)

pα1

(x)p1−α2

(x)dx .

Cherno information (or Cherno divergen e) :

C ∗(p1

: p2

) = Cα∗(p1

: p2

) = max

α∈(0,1)− log

pα1

(x)p1−α2

(x)dx

extend methodoloy with quasi-arithmeti means [11

© 2014 Frank NIELSEN 5793b870 49/57

Page 50: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Cherno oe ient/information : exponential families

Cα(p : q) = − log cα(p, q) = J(α)F (θp : θq),

cα(p : q) = e−Cα(p:q) = e−J(α)F

(θp :θq).

Skewed Jensen divergen e (on natural parameters) :

J(α)F (θp : θq) = (αF (θp) + (1− α)F (θq))− F (αθp + (1− α)θq)

© 2014 Frank NIELSEN 5793b870 50/57

Page 51: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Maximizing skew Jensen divergen e

α∗ = arg max

0<α<1

J(α)F (p : q)

J(α∗)F (p : q) = BF (p : mα∗) = BF (q : mα∗)

mα = αp + (1− α)q : α-mixing of p and q.

J(α∗)F

(p : q)BF (p : mα∗)

BF (q : mα∗)

p q

F (p)

F (q)

mα∗ = α∗p+ (1− α∗)q

F

Maximum skew Jensen divergen e amounts to Bregman

divergen es.

© 2014 Frank NIELSEN 5793b870 51/57

Page 52: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Geometry of the best error exponent : binary hypothesis [11

Cherno distribution P∗:

P∗ = Pθ∗12

= Ge(P1

,P2

) ∩ Bim(P1

,P2

)

e-geodesi :

Ge(P1

,P2

) = E(λ)12

| θ(E(λ)12

) = (1− λ)θ1

+ λθ2

, λ ∈ [0, 1],

m-bise tor :

Bim(P1

,P2

) : P | F (θ1

)− F (θ2

) + η(P)⊤∆θ = 0,

Optimal natural parameter of P∗:

θ∗ = θ(α∗)12

= argminθ∈ΘB(θ1

: θ) = argminθ∈ΘB(θ2

: θ).

→ losed-form for order-1 family, or e ient bise tion sear h.

© 2014 Frank NIELSEN 5793b870 52/57

Page 53: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Geometry of the best error exponent : binary hypothesis

P∗ = Pθ∗12

= Ge(P1

,P2

) ∩ Bim(P1

,P2

)

pθ1

pθ2

pθ∗

12

m-bisector

e-geodesic Ge(Pθ1, Pθ2

)

η-coordinate system

Pθ∗

12

C(θ1 : θ2) = B(θ1 : θ∗12)

Bim(Pθ1, Pθ2

)

BHT : Pe bounded using Bregman divergen e between Cherno

distribution and lass- onditional distributions.

© 2014 Frank NIELSEN 5793b870 53/57

Page 54: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Geometry of the best error exponent : multiple hypothesis

n-ary MHT [7 from minimum pairwise Cherno distan e :

C (P1

, ...,Pn) = min

i ,j 6=iC (Pi ,Pj )

Pme ≤ e−mC(Pi∗ ,Pj∗), (i∗, j∗) = argmini ,j 6=iC (Pi ,Pj )

Compute for ea h pair of natural neighbors Pθi and Pθj , the

Cherno distan e C (Pθi ,Pθj ), and hoose the pair with minimal

distan e. (Proof by ontradi tion using Bregman Pythagoras

theorem.)

© 2014 Frank NIELSEN 5793b870 54/57

Page 55: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Multiple Hypothesis testing & Cherno information

η-coordinate system

Chernoff distribution between

natural neighbours

© 2014 Frank NIELSEN 5793b870 55/57

Page 56: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Summary : Statisti al Voronoi diagrams

Mahalanobis Voronoi diagrams, anisotropi Voronoi diagrams

(additively-weighted)

Fisher-Hotelling-Rao Riemannian Voronoi diagrams

Kullba k-Leibler Voronoi diagrams for exponential families =

Bregman Voronoi diagrams

Extend ordinary Voronoi diagrams (≡ isotropi Gaussians)

Spa e of spheres : Lifting/polarity

bise tors ⊥ geodesi s

dual regular triangulation with empty spheres

Information geometry : ane dierential geometry of parameter

spa es, invarian e prin iples

Many other kinds of Voronoi : Jensen-Bregman, (u, v)-stru tures , onformal divergen es, total Bregman & total Jensen divergen es,

et .

© 2014 Frank NIELSEN 5793b870 56/57

Page 57: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Computational Information Geometry : Edited books

[13 [12

http://www.springer. om/engineering/signals/book/978-3-642-30231-2

http://www.sony sl. o.jp/person/nielsen/infogeo/MIG/MIGBOOKWEB/

http://www.springer. om/engineering/signals/book/978-3-319-05316-5

http://www.sony sl. o.jp/person/nielsen/infogeo/GTI/Geometri TheoryOfInformation.html

© 2014 Frank NIELSEN 5793b870 57/57

Page 58: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Mar Arnaudon and Frank Nielsen.

On approximating the Riemannian 1- enter.

Comput. Geom. Theory Appl., 46(1) :93104, January 2013.

Jean-Daniel Boissonnat and Christophe Delage.

Convex hull and Voronoi diagram of additively weighted points.

In Gerth St¸lting Brodal and Stefano Leonardi, editors, ESA, volume 3669 of Le ture Notes in

Computer S ien e, pages 367378. Springer, 2005.

Jean-Daniel Boissonnat, Frank Nielsen, and Ri hard No k.

Bregman Voronoi diagrams.

Dis rete and Computational Geometry, 44(2) :281307, April 2010.

Bernd Gärtner and Sven S hönherr.

An e ient, exa t, and generi quadrati programming solver for geometri optimization.

In Pro eedings of the sixteenth annual symposium on Computational geometry, pages 110118.

ACM, 2000.

Harold Hotelling.

Fran ois Labelle and Jonathan Ri hard Shew huk.

Anisotropi Voronoi diagrams and guaranteed-quality anisotropi mesh generation.

In Pro eedings of the nineteenth annual symposium on Computational geometry, pages 191200.

ACM, 2003.

C. C. Leang and D. H. Johnson.

On the asymptoti s of M-hypothesis Bayesian dete tion.

IEEE Transa tions on Information Theory, 43(1) :280282, January 1997.

Frank Nielsen.

Legendre transformation and information geometry.

Te hni al Report CIG-MEMO2, September 2010.

Frank Nielsen.

Hypothesis testing, information divergen e and omputational geometry.

© 2014 Frank NIELSEN 5793b870 57/57

Page 59: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

In Frank Nielsen and Fr©d©ri Barbares o, editors, Geometri S ien e of Information, volume

8085 of Le ture Notes in Computer S ien e, pages 241248. Springer Berlin Heidelberg, 2013.

Frank Nielsen.

Generalized bhatta haryya and herno upper bounds on bayes error using quasi-arithmeti

means.

Pattern Re ognition Letters, 42 :2534, 2014.

Frank Nielsen.

Generalized bhatta haryya and herno upper bounds on bayes error using quasi-arithmeti

means.

Pattern Re ognition Letters, 42 :2534, 2014.

Frank Nielsen.

Geometri Theory of Information.

Springer, 2014.

Frank Nielsen and Rajendra Bhatia, editors.

Matrix Information Geometry (Revised Invited Papers). Springer, 2012.

Frank Nielsen and Ri hard No k.

On the smallest en losing information disk.

Information Pro essing Letters (IPL), 105(3) :9397, 2008.

Frank Nielsen and Ri hard No k.

Quantum Voronoi diagrams and Holevo hannel apa ity for 1-qubit quantum states.

In IEEE International Symposium on Information Theory (ISIT), pages 96100, Toronto, Canada,

July 2008. IEEE.

Frank Nielsen and Ri hard No k.

Approximating smallest en losing balls with appli ations to ma hine learning.

Int. J. Comput. Geometry Appl., 19(5) :389414, 2009.

Frank Nielsen and Ri hard No k.

The dual Voronoi diagrams with respe t to representational Bregman divergen es.

In International Symposium on Voronoi Diagrams (ISVD), pages 7178, 2009.

© 2014 Frank NIELSEN 5793b870 57/57

Page 60: Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and their applications

Frank Nielsen and Ri hard No k.

Entropies and ross-entropies of exponential families.

In Pro eedings of the International Conferen e on Image Pro essing, ICIP 2010, September

26-29, Hong Kong, China, pages 36213624, 2010.

Frank Nielsen and Ri hard No k.

Hyperboli Voronoi diagrams made easy.

In 2013 13th International Conferen e on Computational S ien e and Its Appli ations, pages

7480. IEEE, 2010.

Frank Nielsen and Ri hard No k.

Hyperboli Voronoi diagrams made easy.

In International Conferen e on Computational S ien e and its Appli ations (ICCSA), volume 1,

pages 7480, Los Alamitos, CA, USA, mar h 2010. IEEE Computer So iety.

Frank Nielsen and Ri hard No k.

Further results on the hyperboli Voronoi diagrams.

ISVD, 2014.

Frank Nielsen and Ri hard No k.

Visualizing hyperboli Voronoi diagrams.

In Symposium on Computational Geometry, page 90, 2014.

Frank Nielsen, Paolo Piro, and Mi hel Barlaud.

Bregman vantage point trees for e ient nearest neighbor queries.

In Pro eedings of the 2009 IEEE International Conferen e on Multimedia and Expo (ICME),

pages 878881, 2009.

Ri hard No k and Frank Nielsen.

Fitting the smallest en losing bregman balls.

In 16th European Conferen e on Ma hine Learning (ECML), pages 649656, O tober 2005.

Calyampudi Radhakrishna Rao.

Information and the a ura y attainable in the estimation of statisti al parameters.

Bulletin of the Cal utta Mathemati al So iety, 37 :8189, 1945.

© 2014 Frank NIELSEN 5793b870 57/57