consistencyofstatisticsininﬁnitedimensionalquotientspaces...

Consistency of statistics in infinite dimensional quotient spacesPHD defence of Loïc Devilliers, November, 20, 2017

Prepared at Inria Univeristé Côte d’Azur, CMAP École Polytechnique& ENS Paris-Saclay

Jury:Stéphanie Allassonnière Professor Université Paris Descartes Co-advisorMarc Arnaudon Professor Université de Bordeaux ReviewerCharles Bouveyron Professor Université Côte d’Azur PresidentStephan Huckemann Professor University of Göttingen ReviewerXavier Pennec Senior Researcher Université Côte d’Azur, Inria AdvisorStefan Sommer Associated Professor University of Copenhagen ReviewerAlain Trouvé Professor ENS Paris-Saclay Examiner

1

Computational Anatomy: Heart Template Estimation

t0: template, one heart, modeling the others through a diffeomorphism φi.Diffeomorphisms = change the shape but not topology. [Mansi 2009]

(t0, φ1, . . . , φn) = argmint,φ1,...,φn

1n

n∑i=1

(‖t ◦ φi − Patienti‖2 + Regularization(φi)

)2

Computational Anatomy: Brain Template Estimation

[Guimond 1999, Joshi 2004 etc.], Image from [Hamou 2016]

(t0, φ1, . . . , φn) = argmint φ1,...,φn

1n

n∑i=1

(‖t ◦ φi − Yi‖2 + Regularization(φi)

)Template estimation is a tool to statistically analyze diseases.

3

Template Estimation with Surfaces

[courtesy of Pierre Roussillon]

(t0, φ1, . . . , φn) = argmint,φ1,...,φn

1n

n∑i=1

(‖t ◦ φi − Si‖2 + Regularization(φi)

)Goal of this work : study the statistical properties of template estimation. 4

Example: Periodic (discretized) signals

Simple example to introduce the Generative Model: In M = Per1(R,R).

0 0.2 0.4 0.6 0.8 1-0.5

0

0.5

1

1.5

Template: t0

5



0 0.2 0.4 0.6 0.8 1-0.5

0

0.5

1

1.5

Transformed template by a translation: t0 ◦ ϕ

Note that for the L2 norm, we have ‖t0 ◦ ϕ‖ = ‖t0‖.

5



0 0.2 0.4 0.6 0.8 1-0.5

0

0.5

1

1.5

Template and deformed template added to noise: t0 ◦ ϕ+ ε

For instance, Gaussian noise on each point of the discretization grid.Goal: study the statistical properties of the estimator of t0.

5

Generative model

A group G acts on an ambient space M: for g ∈ G, m ∈ M, g · m = gm ∈ M.Observable variable:

Y = Φ · t0 + σε forward modelor

Y = Φ · (t0 + σε) backward model

• Φ a random variable in G.

• t0 the template in M.

• σ > 0 the noise level.

• ε a standardized noise in M: E(ε) = 0, E(‖ε‖2) = 1.

• Φ and ε are independent.

Inverse problem

Given the observed variable Y, how can we estimate the template t0?

6

Minimization (max-max algorithm)

Estimation by minimizing the variance?

The variance at m ∈ M:

F(m) = E(

infg∈G‖m− g · Y‖2 + Regularization(g)

)

The empirical variance at m ∈ M for an n-sample Y1, . . . , Yn:

Fn(m) = infg1,...,gn∈G

(1n

n∑i=1

‖m− gi · Yi‖2

)

Max-max algorithm (also known as Coordinate Descent, GPA, etc.)

Alternatively minimization (over these two steps):

• Step 1: gi← registration of Yi to m, for all i.

• Step 2: m← 1n

n∑i=1

gi · Yi

7

Example of a failure of max-max algorithm

On the previous example of translated functions: sample of size 105 of discretized functions

with 64 points, σ = 10 [Allassonnière 2007]. Starting point: the template itself.

0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

1.5

2template

Convergence to a local minimum without approximation. 8




0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

1.5

2templatecurrent point at the 1th iteration





0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

1.5



Template estimation with different sample sizes

Starting point: random point

0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

1.5

2

sample size: 2e+05templatemax-max ouput

Inconsistency of the estimator?9



0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

1.5

2



Previous works and contributions

Previous works on consistency:

• [Kent & Mardia 1995], [Le 1998] and others restricted to simpletransformations such as rotation, translation, sometimes scaling:

• Consistency with scaling (modification of the algorithm: Y ← Y‖Y‖ ).

• Inconsistency without scaling.

• [Huckemann 2012] Template and estimated template lie ondifferent strata for general action in finite dimensional manifold.

• [Miolane 2017] Consistency Bias = σ2 C2 + o(σ) as σ → 0 in finite

dimensional manifold for Gaussian noise.

Goal of this Phd work: proving and quantifying this inconsistency,in infinite dimensional spaces.

10

Different hypotheses for the action

Isometric Action:

‖g · x‖ = ‖x‖

Invariant Distance:dM(g · x, g · y) = dM(x, y)

General Action

General Action + Regularization Term

What we Want for Application

Part I Part II

The most restrictive hypothesis = the smallest rectangle 11

Table of contents

Introduction

Part I: Inconsistency for Isometric Action

Part II: Inconsistency for Non isometric Action

Conclusion

12

Table of Contents

Introduction


a) Interpretation of the Max-Max Algorithm with the Fréchet Mean inQuotient Spaces

b) Proving the Inconsistency for Isometric Action

c) Quantification of Consistency Bias for Isometric Action


Conclusion

13

Definitions

Definition of Quotient Space

Orbit of m ∈ M = set of all the points reachable from m:

[m] = {g · m, g ∈ G}.

Quotient space = set of all orbits: Q = M/G = {[m],m ∈ M}.

Definition of Invariant Distance

dM(m,m′) = dM(g · m, g · m′).

Particular case of Invariant Distance: Isometric Action in Hilbert Space

Isometric Action: M a Hilbert, m 7→ g · m linear, ‖g · m‖ = ‖m‖.Proof: ‖g · m− g · m′‖ = ‖g · (m− m′)‖ = ‖m− m′‖.

Classical Proposition: Quotient space = Metric Space

dM invariant quotient distance: dQ ([m], [n]) = infg∈G

dM(m, g · n).

In fact, dQ = pseudo-distance.

14

Fréchet Mean in Metric Spaces

Definition of Fréchet mean in metric spaces

Fréchet Mean of Z a random variable in a metric space (X , dX ):

FM(Z) = argminm∈X

E(d2X (m, Z))

Empirical Fréchet Mean of a n-sample Z1, . . . , Zn:

EFM(Z1, . . . , Zn) = argminm∈X

1n

n∑i=1

d2X (m, Zi)

Example of Hilbert spaces:

For a Hilbert (M, ‖ ‖): FM(Z) = E(Z).

15

Consistency of Estimation

Fn(m) =1n

n∑i=1

infgi∈G‖m− gi · Yi‖2 =

1n

n∑i=1

d2Q ([m], [Yi])

Minimizing Empirical Variance = Empirical Fréchet Mean (EFM) in Q

Law of large numbers for the sets of (empirical) Fréchet means

Y, (Yn)n i.i.d variables. Thanks to [Ziezold 1977] (if Q is separable):

limn→+∞

EFM([Y1], . . . , [Yn]) ⊂ FM([Y]) a.s.

[t0] not a Fréchet mean of [Y] Inconsistency.

Definition of consistency bias

Consistency bias (CB): distance between [t0] and FM([Y]).

16

Simple example: the action of rotation

Considering SO(n) acting on Rn by rotation.

•0

••

m

Y dQ ([m], [Y])

Q ' R+

Two orbits (circles), the quotient space (R+), and the distance between orbits

F(m) = E((‖Y‖ − ‖m‖)2), Fréchet mean: ‖m?‖ = E(‖Y‖).Y = Φ · (t0 + σε) ‖m?‖ = E(‖t0 + σε‖)> ‖t0‖ (in general). inconsistency, + Consistency bias computed [Miolane 2017].

Example too simple: infima are removed, not always possible.

17

Why isometric action is simple?

Our first result of consistency only for isometric action.Isometric action simplification of the square quotient distance:

dQ ([a], [b])2 = infg∈G‖a− g · b‖2 = ‖a‖2 + inf

g∈G(−2 〈a, g · b〉+ ‖g · b‖2)

18

Why isometric action is simple?

Our first result of consistency only for isometric action.Isometric action simplification of the square quotient distance:

dQ ([a], [b])2 = infg∈G‖a− g · b‖2 = ‖a‖2 + inf

g∈G(−2 〈a, g · b〉+ ‖g · b‖2)

= ‖a‖2 + ‖b‖2 + infg∈G

(−2 〈a, g · b〉)

Useful equality for the proof and the quantification of the consistency.

18

Table of Contents

Introduction






Conclusion

19

Inconsistency for isometric action

0 t0

gt0

g′t0

Cone(t0)

Cone of the template (in gray), and support of t0 + σε (dotted disk).

Theorem: Inconsistency for isometric action in Hilbert space

Observable variable: Y = Φ · (t0 + σε). If:

P(t0 + σε /∈ Cone(t0)) > 0

Then [t0] is not a Fréchet mean of [Y] Inconsistency.20

Sketch of the proof (finite group = more visual proof)

For G finite, R(X) registration of X = t0 + σε to t0.

Gradient of the variance: ∇F(t0) = 2 (E(X)− E(R(X)))

0 t0

gt0

g′t0

Cone(t0)

E(X) = t0

0

•X

•g1X•g2X

•g3X

t0

gt0

g′t0

Cone(t0)

Points in green = Orbit of X.

21




0 t0

gt0

g′t0

Cone(t0)

E(X) = t0

0

•X

•R(X)

t0

gt0

g′t0

Cone(t0)

R(X): point in the orbit of X in Cone(t0).

21




0 t0

gt0

g′t0

Cone(t0)

E(X) = t0

0

•X

•R(X)

•Xt0

gt0

g′t0

Cone(t0)

X ∈ Cone(t0), then R(X) = X.

21




0 t0

gt0

g′t0

Cone(t0)

E(X) = t0

gt0

g′t0

Z0 t0

Cone(t0)

Graphic representation of Z = E(R(X)).The part in grid-line = folded points.

∇F(t0) 6= 0 Inconsistency21

Sketch of the proof (finite or infinite group)

When the group is not finite, differentiate the variance.Two possible methods to show inconsistency:

• Find argmin F, and see if t0 ∈ argmin F : difficult issue.

• Find a point x such has F(x) < F(t0):

We found a point λt0 with F(λt0) < F(t0) Inconsistent.Be careful, a priori [λt0] is not a Fréchet mean of [Y].

22

How often is fulfilled this condition with the cone?

A group G acts isometrically on a Hilbert space. [t0] a manifold,Tt0 [t0] the affine tangent space of [t0] at t0.Tt0 [t0]⊥ the normal space of [t0] at t0.

Proposition: being inconsistent for smooth orbits.

P(ε /∈ Tt0 [t0]⊥) > 0 =⇒ inconsistency

[t0]

Tt0 [t0]⊥

Tt0 [t0]

g · t0

0t0

y

y /∈ Tt0 [t0]⊥ therefore y is closer from g · t0 for some g ∈ G than t0 itself. In

conclusion, y in the support of X = t0 + σε inconsistency.

23

Table of Contents

Introduction






Conclusion

24

Consistency bias when the noise level tends to infinity

Definition of consistency bias

Consistency bias (CB) : distance between the template t0 and argmin F.

Definition of fixed points

A fixed point m ∈ M : for all g ∈ G, g · m = m.

Proposition: consistency bias is asymptotically linear when σ → +∞G acts isometrically on a Hilbert space. We take Y = Φ · t0 + σε.If support of the noise ε is not included in the set of fixed points then:

CB = σK + o(σ) as σ → +∞, where K = sup‖v‖=1

E

(supg∈G〈v, g · ε〉

)> 0.

Moreover, limt0→0

CB = σK.

25

Sketch of the proof

F(m) = E(

infg∈G‖m− g · Y‖2

)where Y =Φ · t0 +σε.

• Minimization of F(λv) w.r.t. λ ≥ 0, ‖v‖ = 1. Then m? ∈ argmin F

‖m?‖ = sup‖v‖=1

E

(supg∈G〈v, g · Y〉

)

= sup‖v‖=1

E

(supg∈G

(〈v, gΦt0〉+ 〈v, σgε〉)

)Difficult (impossible?) to compute.

• Cauchy-Schwarz inequality:

−‖t0‖+ σK ≤ ‖m?‖ ≤ ‖t0‖+ σK

• By triangular inequality:

−2‖t0‖+ σK ≤ ‖m? − t0‖ ≤ σK + 2‖t0‖

K > 0 (because the support ε is not included in the set of fixed points).

26

Sketch of the proof

F(m) = E(

infg∈G‖m− g · Y‖2

)where Y =Φ · t0 +σε.

• Minimization of F(λv) w.r.t. λ ≥ 0, ‖v‖ = 1. Then m? ∈ argmin F

‖m?‖ = sup‖v‖=1

E

(supg∈G〈v, g · Y〉

)= sup‖v‖=1

E

(supg∈G

(〈v, gΦt0〉+ 〈v, σgε〉)

)Difficult (impossible?) to compute.

• Cauchy-Schwarz inequality:

−‖t0‖+ σK ≤ ‖m?‖ ≤ ‖t0‖+ σK

• By triangular inequality:

−2‖t0‖+ σK ≤ ‖m? − t0‖ ≤ σK + 2‖t0‖

K > 0 (because the support ε is not included in the set of fixed points).

26

Table of Contents

Introduction



a) Inconsistency for Invariant Distance

b) Inconsistency for Non Invariant Distance

Conclusion

27

Variation of the Isotropy Group Due to the Noise

Definition: Isotropy Group (or Stabilizer)

Iso(m) = {g ∈ G, s.t. g · m = m}

Example: Reparametrization of functions

ϕ : [0, 1]→ [0, 1] homeomorphism, f : [0, 1]→ R (ϕ, f) 7→ ϕ · f = f ◦ ϕ

t0 constant map on D = [0.2, 0.8]

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5t0

Iso(t0) = {ϕ | ϕ|Dc = Id} ! {Id}

t0 + noise

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5t0+noise

Iso(t0 + noise) = {Id}

28

Variation of the Isotropy Group Due to the Noise

Definition: Isotropy Group (or Stabilizer)

Iso(m) = {g ∈ G, s.t. g · m = m}

Example: Reparametrization of functions

ϕ : [0, 1]→ [0, 1] homeomorphism, f : [0, 1]→ R (ϕ, f) 7→ ϕ · f = f ◦ ϕ

t0 constant map on D = [0.2, 0.8]

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5t0

Iso(t0) = {ϕ | ϕ|Dc = Id} ! {Id}

t0 + noise

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5t0+noise

Iso(t0 + noise) = {Id}28

Stability Theorem Implies Inconsistency

Stability Theorem in Hilbert spaces

G a compact group acting continuously on M a Hilbert space, dM isinvariant. Observable variable Y in M. If

P(Iso(Y) = {eG}) > 0 eG : neutral element in G.

m? ∈ argminm∈M

F(m) = argminm∈M

E(

infg∈G

dM(m, g · Y)2

).

If R(Y) is a measurable variable registering Y to m?, then:

Iso(m?) = {eG}.

Implies Inconsistency if Iso(t0) 6= {eG}.Stability Theorem also true in complete finite dimensional Riemannianmanifolds and proof of the measurable variable R(Y) [Huckemann 2012].

29

Table of Contents

Introduction



a) Inconsistency for Invariant Distance

b) Inconsistency for Non Invariant Distance

Conclusion

30

Non Invariant Distance

Non invariant distance used in applications:

Reparametrization by a diffeomorphism ϕ

fi : Rd → R images d = 2 or signals d = 1: ‖f1 ◦ϕ− f2 ◦ϕ‖2 6= ‖f1− f2‖2.

G acting on a Hilbert space:A priori, possibility to define a distance in the quotient space.For Y = Φ · t0 + σε. minimizing F(m) = E( inf

g∈G‖Y − g · m‖2): still possible.

31

How to deal with non isometric action?

Isometric

σ•t0•0

Orbit of the template, in gray the noise.

We can find a point λt0 such thatF(λt0) < F(t0).

General Action with Bounded Orbit

•t0•0

σ

Bounded orbit of the template, in graythe noise.

32

How to deal with non isometric action?

Isometric

σ•0

Orbit of the template, in gray the noise.

We can find a point λt0 such thatF(λt0) < F(t0).

General Action with Bounded Orbit

σ•0

Bounded orbit of the template, in graythe noise.

So why not in this case?

32

Inconsistency for non invariant distance

Inconsistency: a subgroup of G acts isometrically

A group G acting on a Hilbert space, [t0] is bounded. We note:

θ(G) =1‖t0‖

E

(supg∈G〈g · t0, ε〉

)

If H a subgroup of G, H acts isometrically and θ(H) > 0,then inconsistency for σ > σc = f([t0], θ(G), θ(H), t0) for a certainpositive function f .

Example

G = group of diffeomorphisms, H = rotations.

33

Inconsistency for non invariant distance

Inconsistency for G acting linearly + Regularization

A group G acting linearly on a Hilbert space, [t0] is bounded. We note:

θ(G) =1‖t0‖

E

(supg∈G〈g · t0, ε〉

).

The template estimation is performed by minimizing

F(m) = E(

infg∈G‖g · m− Y‖2 + Regularization(g)

),

where Regularization is bounded. If θ(G) > 0 then Inconsistency forσ > σc = f([t0], θ(G), t0) for a certain positive function f .

Action of reparametrization of functions

ϕ a diffeo (ϕ, f) 7→ f ◦ ϕ linear action.Proof: (af1 + f2) ◦ ϕ = af1 ◦ ϕ + f2 ◦ ϕ.

34

Table of contents

Introduction



Conclusion

35

Summary of contributions

• It is proved that the template estimation with the Fréchet mean inquotient space is not consistent for isometric action.

• It is possible to quantify the consistency bias for σ → +∞.

• We proved a stability theorem which implies the inconsistency inHilbert Space for invariant distance.

• The inconsistency can also be proved for not isometric action, but onlyfor σ high enough.

This work has been presented in a workshsop (MFCA 2015), published in aconference (IPMI 2017) and in two journal papers (SIIMS 2017 and Entropy2017).

36

What are the possible extensions?

• Extending the existence of the measurable variable which registers datato a certain point.

• Proving the inconsistency for non invariant distance for all σ.

• Provide an asymptotic behaviour of the consistency bias when σ → 0.

37

Thank you for your attention!Any questions?

37

Example 2: action of diffeomorphisms on functions

0 0.2 0.4 0.6 0.8 1-2

-1

0

1

2

Template: t0


0 0.2 0.4 0.6 0.8 1-2

-1

0

1

2

Deformed template: t0 ◦ ϕ

SRVF: The norm of f√|f|

is invariant under the action of ϕ commonly used.


0 0.2 0.4 0.6 0.8 1-2

-1

0

1

2

Template and deformed template added to noise: t0 ◦ ϕ+ ε

Example 3: Consistency and smoothness

Example of translated functions: sample size 106 of discretized functionswith 64 points, σ = 10.

0 0.2 0.4 0.6 0.8 1-0.5

0

0.5

1

1.5templatemax max output

Example 4: Local minima

0.4 0.45 0.5 0.55 0.6

1

1.2

1.4

consistencyofstatisticsininﬁnitedimensionalquotientspaces...

Documents