consistencyofstatisticsininfinitedimensionalquotientspaces...
TRANSCRIPT
Consistency of statistics in infinite dimensional quotient spacesPHD defence of Loïc Devilliers, November, 20, 2017
Prepared at Inria Univeristé Côte d’Azur, CMAP École Polytechnique& ENS Paris-Saclay
Jury:Stéphanie Allassonnière Professor Université Paris Descartes Co-advisorMarc Arnaudon Professor Université de Bordeaux ReviewerCharles Bouveyron Professor Université Côte d’Azur PresidentStephan Huckemann Professor University of Göttingen ReviewerXavier Pennec Senior Researcher Université Côte d’Azur, Inria AdvisorStefan Sommer Associated Professor University of Copenhagen ReviewerAlain Trouvé Professor ENS Paris-Saclay Examiner
1
Computational Anatomy: Heart Template Estimation
t0: template, one heart, modeling the others through a diffeomorphism φi.Diffeomorphisms = change the shape but not topology. [Mansi 2009]
(t0, φ1, . . . , φn) = argmint,φ1,...,φn
1n
n∑i=1
(‖t ◦ φi − Patienti‖2 + Regularization(φi)
)2
Computational Anatomy: Brain Template Estimation
[Guimond 1999, Joshi 2004 etc.], Image from [Hamou 2016]
(t0, φ1, . . . , φn) = argmint φ1,...,φn
1n
n∑i=1
(‖t ◦ φi − Yi‖2 + Regularization(φi)
)Template estimation is a tool to statistically analyze diseases.
3
Template Estimation with Surfaces
[courtesy of Pierre Roussillon]
(t0, φ1, . . . , φn) = argmint,φ1,...,φn
1n
n∑i=1
(‖t ◦ φi − Si‖2 + Regularization(φi)
)Goal of this work : study the statistical properties of template estimation. 4
Example: Periodic (discretized) signals
Simple example to introduce the Generative Model: In M = Per1(R,R).
0 0.2 0.4 0.6 0.8 1-0.5
0
0.5
1
1.5
Template: t0
5
Example: Periodic (discretized) signals
Simple example to introduce the Generative Model: In M = Per1(R,R).
0 0.2 0.4 0.6 0.8 1-0.5
0
0.5
1
1.5
Transformed template by a translation: t0 ◦ ϕ
Note that for the L2 norm, we have ‖t0 ◦ ϕ‖ = ‖t0‖.
5
Example: Periodic (discretized) signals
Simple example to introduce the Generative Model: In M = Per1(R,R).
0 0.2 0.4 0.6 0.8 1-0.5
0
0.5
1
1.5
Template and deformed template added to noise: t0 ◦ ϕ+ ε
For instance, Gaussian noise on each point of the discretization grid.Goal: study the statistical properties of the estimator of t0.
5
Generative model
A group G acts on an ambient space M: for g ∈ G, m ∈ M, g · m = gm ∈ M.Observable variable:
Y = Φ · t0 + σε forward modelor
Y = Φ · (t0 + σε) backward model
• Φ a random variable in G.
• t0 the template in M.
• σ > 0 the noise level.
• ε a standardized noise in M: E(ε) = 0, E(‖ε‖2) = 1.
• Φ and ε are independent.
Inverse problem
Given the observed variable Y, how can we estimate the template t0?
6
Generative model
A group G acts on an ambient space M: for g ∈ G, m ∈ M, g · m = gm ∈ M.Observable variable:
Y = Φ · t0 + σε forward modelor
Y = Φ · (t0 + σε) backward model
• Φ a random variable in G.
• t0 the template in M.
• σ > 0 the noise level.
• ε a standardized noise in M: E(ε) = 0, E(‖ε‖2) = 1.
• Φ and ε are independent.
Inverse problem
Given the observed variable Y, how can we estimate the template t0?
6
Minimization (max-max algorithm)
Estimation by minimizing the variance?
The variance at m ∈ M:
F(m) = E(
infg∈G‖m− g · Y‖2 + Regularization(g)
)
The empirical variance at m ∈ M for an n-sample Y1, . . . , Yn:
Fn(m) = infg1,...,gn∈G
(1n
n∑i=1
‖m− gi · Yi‖2
)
Max-max algorithm (also known as Coordinate Descent, GPA, etc.)
Alternatively minimization (over these two steps):
• Step 1: gi← registration of Yi to m, for all i.
• Step 2: m← 1n
n∑i=1
gi · Yi
7
Minimization (max-max algorithm)
Estimation by minimizing the variance?
The variance at m ∈ M:
F(m) = E(
infg∈G‖m− g · Y‖2 + Regularization(g)
)
The empirical variance at m ∈ M for an n-sample Y1, . . . , Yn:
Fn(m) = infg1,...,gn∈G
(1n
n∑i=1
‖m− gi · Yi‖2
)
Max-max algorithm (also known as Coordinate Descent, GPA, etc.)
Alternatively minimization (over these two steps):
• Step 1: gi← registration of Yi to m, for all i.
• Step 2: m← 1n
n∑i=1
gi · Yi
7
Example of a failure of max-max algorithm
On the previous example of translated functions: sample of size 105 of discretized functions
with 64 points, σ = 10 [Allassonnière 2007]. Starting point: the template itself.
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2template
Convergence to a local minimum without approximation. 8
Example of a failure of max-max algorithm
On the previous example of translated functions: sample of size 105 of discretized functions
with 64 points, σ = 10 [Allassonnière 2007]. Starting point: the template itself.
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2templatecurrent point at the 1th iteration
Convergence to a local minimum without approximation. 8
Example of a failure of max-max algorithm
On the previous example of translated functions: sample of size 105 of discretized functions
with 64 points, σ = 10 [Allassonnière 2007]. Starting point: the template itself.
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2templatecurrent point at the 2th iteration
Convergence to a local minimum without approximation. 8
Example of a failure of max-max algorithm
On the previous example of translated functions: sample of size 105 of discretized functions
with 64 points, σ = 10 [Allassonnière 2007]. Starting point: the template itself.
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2templatecurrent point at the 3th iteration
Convergence to a local minimum without approximation. 8
Example of a failure of max-max algorithm
On the previous example of translated functions: sample of size 105 of discretized functions
with 64 points, σ = 10 [Allassonnière 2007]. Starting point: the template itself.
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2templatecurrent point at the 4th iteration
Convergence to a local minimum without approximation. 8
Example of a failure of max-max algorithm
On the previous example of translated functions: sample of size 105 of discretized functions
with 64 points, σ = 10 [Allassonnière 2007]. Starting point: the template itself.
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2templatecurrent point at the 5th iteration
Convergence to a local minimum without approximation. 8
Example of a failure of max-max algorithm
On the previous example of translated functions: sample of size 105 of discretized functions
with 64 points, σ = 10 [Allassonnière 2007]. Starting point: the template itself.
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2templatecurrent point at the 10th iteration
Convergence to a local minimum without approximation. 8
Example of a failure of max-max algorithm
On the previous example of translated functions: sample of size 105 of discretized functions
with 64 points, σ = 10 [Allassonnière 2007]. Starting point: the template itself.
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2templatecurrent point at the 50th iteration
Convergence to a local minimum without approximation. 8
Example of a failure of max-max algorithm
On the previous example of translated functions: sample of size 105 of discretized functions
with 64 points, σ = 10 [Allassonnière 2007]. Starting point: the template itself.
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2templatecurrent point at the 79th iteration
Convergence to a local minimum without approximation. 8
Template estimation with different sample sizes
Starting point: random point
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2
sample size: 2e+05templatemax-max ouput
Inconsistency of the estimator?9
Template estimation with different sample sizes
Starting point: random point
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2
sample size: 4e+05templatemax-max ouput
Inconsistency of the estimator?9
Template estimation with different sample sizes
Starting point: random point
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2
sample size: 6e+05templatemax-max ouput
Inconsistency of the estimator?9
Template estimation with different sample sizes
Starting point: random point
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2
sample size: 8e+05templatemax-max ouput
Inconsistency of the estimator?9
Template estimation with different sample sizes
Starting point: random point
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
1.5
2
sample size: 1e+06templatemax-max ouput
Inconsistency of the estimator?9
Previous works and contributions
Previous works on consistency:
• [Kent & Mardia 1995], [Le 1998] and others restricted to simpletransformations such as rotation, translation, sometimes scaling:
• Consistency with scaling (modification of the algorithm: Y ← Y‖Y‖ ).
• Inconsistency without scaling.
• [Huckemann 2012] Template and estimated template lie ondifferent strata for general action in finite dimensional manifold.
• [Miolane 2017] Consistency Bias = σ2 C2 + o(σ) as σ → 0 in finite
dimensional manifold for Gaussian noise.
Goal of this Phd work: proving and quantifying this inconsistency,in infinite dimensional spaces.
10
Previous works and contributions
Previous works on consistency:
• [Kent & Mardia 1995], [Le 1998] and others restricted to simpletransformations such as rotation, translation, sometimes scaling:
• Consistency with scaling (modification of the algorithm: Y ← Y‖Y‖ ).
• Inconsistency without scaling.
• [Huckemann 2012] Template and estimated template lie ondifferent strata for general action in finite dimensional manifold.
• [Miolane 2017] Consistency Bias = σ2 C2 + o(σ) as σ → 0 in finite
dimensional manifold for Gaussian noise.
Goal of this Phd work: proving and quantifying this inconsistency,in infinite dimensional spaces.
10
Different hypotheses for the action
Isometric Action:
‖g · x‖ = ‖x‖
Invariant Distance:dM(g · x, g · y) = dM(x, y)
General Action
General Action + Regularization Term
What we Want for Application
Part I Part II
The most restrictive hypothesis = the smallest rectangle 11
Table of contents
Introduction
Part I: Inconsistency for Isometric Action
Part II: Inconsistency for Non isometric Action
Conclusion
12
Table of Contents
Introduction
Part I: Inconsistency for Isometric Action
a) Interpretation of the Max-Max Algorithm with the Fréchet Mean inQuotient Spaces
b) Proving the Inconsistency for Isometric Action
c) Quantification of Consistency Bias for Isometric Action
Part II: Inconsistency for Non isometric Action
Conclusion
13
Definitions
Definition of Quotient Space
Orbit of m ∈ M = set of all the points reachable from m:
[m] = {g · m, g ∈ G}.
Quotient space = set of all orbits: Q = M/G = {[m],m ∈ M}.
Definition of Invariant Distance
dM(m,m′) = dM(g · m, g · m′).
Particular case of Invariant Distance: Isometric Action in Hilbert Space
Isometric Action: M a Hilbert, m 7→ g · m linear, ‖g · m‖ = ‖m‖.Proof: ‖g · m− g · m′‖ = ‖g · (m− m′)‖ = ‖m− m′‖.
Classical Proposition: Quotient space = Metric Space
dM invariant quotient distance: dQ ([m], [n]) = infg∈G
dM(m, g · n).
In fact, dQ = pseudo-distance.
14
Definitions
Definition of Quotient Space
Orbit of m ∈ M = set of all the points reachable from m:
[m] = {g · m, g ∈ G}.
Quotient space = set of all orbits: Q = M/G = {[m],m ∈ M}.
Definition of Invariant Distance
dM(m,m′) = dM(g · m, g · m′).
Particular case of Invariant Distance: Isometric Action in Hilbert Space
Isometric Action: M a Hilbert, m 7→ g · m linear, ‖g · m‖ = ‖m‖.Proof: ‖g · m− g · m′‖ = ‖g · (m− m′)‖ = ‖m− m′‖.
Classical Proposition: Quotient space = Metric Space
dM invariant quotient distance: dQ ([m], [n]) = infg∈G
dM(m, g · n).
In fact, dQ = pseudo-distance.
14
Definitions
Definition of Quotient Space
Orbit of m ∈ M = set of all the points reachable from m:
[m] = {g · m, g ∈ G}.
Quotient space = set of all orbits: Q = M/G = {[m],m ∈ M}.
Definition of Invariant Distance
dM(m,m′) = dM(g · m, g · m′).
Particular case of Invariant Distance: Isometric Action in Hilbert Space
Isometric Action: M a Hilbert, m 7→ g · m linear, ‖g · m‖ = ‖m‖.Proof: ‖g · m− g · m′‖ = ‖g · (m− m′)‖ = ‖m− m′‖.
Classical Proposition: Quotient space = Metric Space
dM invariant quotient distance: dQ ([m], [n]) = infg∈G
dM(m, g · n).
In fact, dQ = pseudo-distance.
14
Definitions
Definition of Quotient Space
Orbit of m ∈ M = set of all the points reachable from m:
[m] = {g · m, g ∈ G}.
Quotient space = set of all orbits: Q = M/G = {[m],m ∈ M}.
Definition of Invariant Distance
dM(m,m′) = dM(g · m, g · m′).
Particular case of Invariant Distance: Isometric Action in Hilbert Space
Isometric Action: M a Hilbert, m 7→ g · m linear, ‖g · m‖ = ‖m‖.Proof: ‖g · m− g · m′‖ = ‖g · (m− m′)‖ = ‖m− m′‖.
Classical Proposition: Quotient space = Metric Space
dM invariant quotient distance: dQ ([m], [n]) = infg∈G
dM(m, g · n).
In fact, dQ = pseudo-distance.
14
Fréchet Mean in Metric Spaces
Definition of Fréchet mean in metric spaces
Fréchet Mean of Z a random variable in a metric space (X , dX ):
FM(Z) = argminm∈X
E(d2X (m, Z))
Empirical Fréchet Mean of a n-sample Z1, . . . , Zn:
EFM(Z1, . . . , Zn) = argminm∈X
1n
n∑i=1
d2X (m, Zi)
Example of Hilbert spaces:
For a Hilbert (M, ‖ ‖): FM(Z) = E(Z).
15
Fréchet Mean in Metric Spaces
Definition of Fréchet mean in metric spaces
Fréchet Mean of Z a random variable in a metric space (X , dX ):
FM(Z) = argminm∈X
E(d2X (m, Z))
Empirical Fréchet Mean of a n-sample Z1, . . . , Zn:
EFM(Z1, . . . , Zn) = argminm∈X
1n
n∑i=1
d2X (m, Zi)
Example of Hilbert spaces:
For a Hilbert (M, ‖ ‖): FM(Z) = E(Z).
15
Consistency of Estimation
Fn(m) =1n
n∑i=1
infgi∈G‖m− gi · Yi‖2 =
1n
n∑i=1
d2Q ([m], [Yi])
Minimizing Empirical Variance = Empirical Fréchet Mean (EFM) in Q
Law of large numbers for the sets of (empirical) Fréchet means
Y, (Yn)n i.i.d variables. Thanks to [Ziezold 1977] (if Q is separable):
limn→+∞
EFM([Y1], . . . , [Yn]) ⊂ FM([Y]) a.s.
[t0] not a Fréchet mean of [Y] Inconsistency.
Definition of consistency bias
Consistency bias (CB): distance between [t0] and FM([Y]).
16
Consistency of Estimation
Fn(m) =1n
n∑i=1
infgi∈G‖m− gi · Yi‖2 =
1n
n∑i=1
d2Q ([m], [Yi])
Minimizing Empirical Variance = Empirical Fréchet Mean (EFM) in Q
Law of large numbers for the sets of (empirical) Fréchet means
Y, (Yn)n i.i.d variables. Thanks to [Ziezold 1977] (if Q is separable):
limn→+∞
EFM([Y1], . . . , [Yn]) ⊂ FM([Y]) a.s.
[t0] not a Fréchet mean of [Y] Inconsistency.
Definition of consistency bias
Consistency bias (CB): distance between [t0] and FM([Y]).
16
Consistency of Estimation
Fn(m) =1n
n∑i=1
infgi∈G‖m− gi · Yi‖2 =
1n
n∑i=1
d2Q ([m], [Yi])
Minimizing Empirical Variance = Empirical Fréchet Mean (EFM) in Q
Law of large numbers for the sets of (empirical) Fréchet means
Y, (Yn)n i.i.d variables. Thanks to [Ziezold 1977] (if Q is separable):
limn→+∞
EFM([Y1], . . . , [Yn]) ⊂ FM([Y]) a.s.
[t0] not a Fréchet mean of [Y] Inconsistency.
Definition of consistency bias
Consistency bias (CB): distance between [t0] and FM([Y]).
16
Simple example: the action of rotation
Considering SO(n) acting on Rn by rotation.
•0
••
m
Y dQ ([m], [Y])
Q ' R+
Two orbits (circles), the quotient space (R+), and the distance between orbits
F(m) = E((‖Y‖ − ‖m‖)2), Fréchet mean: ‖m?‖ = E(‖Y‖).Y = Φ · (t0 + σε) ‖m?‖ = E(‖t0 + σε‖)> ‖t0‖ (in general). inconsistency, + Consistency bias computed [Miolane 2017].
Example too simple: infima are removed, not always possible.
17
Simple example: the action of rotation
Considering SO(n) acting on Rn by rotation.
•0
••
m
Y dQ ([m], [Y])
Q ' R+
Two orbits (circles), the quotient space (R+), and the distance between orbits
F(m) = E((‖Y‖ − ‖m‖)2), Fréchet mean: ‖m?‖ = E(‖Y‖).Y = Φ · (t0 + σε) ‖m?‖ = E(‖t0 + σε‖)> ‖t0‖ (in general). inconsistency, + Consistency bias computed [Miolane 2017].
Example too simple: infima are removed, not always possible.
17
Why isometric action is simple?
Our first result of consistency only for isometric action.Isometric action simplification of the square quotient distance:
dQ ([a], [b])2 = infg∈G‖a− g · b‖2 = ‖a‖2 + inf
g∈G(−2 〈a, g · b〉+ ‖g · b‖2)
18
Why isometric action is simple?
Our first result of consistency only for isometric action.Isometric action simplification of the square quotient distance:
dQ ([a], [b])2 = infg∈G‖a− g · b‖2 = ‖a‖2 + inf
g∈G(−2 〈a, g · b〉+ ‖g · b‖2)
= ‖a‖2 + ‖b‖2 + infg∈G
(−2 〈a, g · b〉)
Useful equality for the proof and the quantification of the consistency.
18
Table of Contents
Introduction
Part I: Inconsistency for Isometric Action
a) Interpretation of the Max-Max Algorithm with the Fréchet Mean inQuotient Spaces
b) Proving the Inconsistency for Isometric Action
c) Quantification of Consistency Bias for Isometric Action
Part II: Inconsistency for Non isometric Action
Conclusion
19
Inconsistency for isometric action
0 t0
gt0
g′t0
Cone(t0)
Cone of the template (in gray), and support of t0 + σε (dotted disk).
Theorem: Inconsistency for isometric action in Hilbert space
Observable variable: Y = Φ · (t0 + σε). If:
P(t0 + σε /∈ Cone(t0)) > 0
Then [t0] is not a Fréchet mean of [Y] Inconsistency.20
Sketch of the proof (finite group = more visual proof)
For G finite, R(X) registration of X = t0 + σε to t0.
Gradient of the variance: ∇F(t0) = 2 (E(X)− E(R(X)))
0 t0
gt0
g′t0
Cone(t0)
E(X) = t0
0
•X
•g1X•g2X
•g3X
t0
gt0
g′t0
Cone(t0)
Points in green = Orbit of X.
21
Sketch of the proof (finite group = more visual proof)
For G finite, R(X) registration of X = t0 + σε to t0.
Gradient of the variance: ∇F(t0) = 2 (E(X)− E(R(X)))
0 t0
gt0
g′t0
Cone(t0)
E(X) = t0
0
•X
•R(X)
t0
gt0
g′t0
Cone(t0)
R(X): point in the orbit of X in Cone(t0).
21
Sketch of the proof (finite group = more visual proof)
For G finite, R(X) registration of X = t0 + σε to t0.
Gradient of the variance: ∇F(t0) = 2 (E(X)− E(R(X)))
0 t0
gt0
g′t0
Cone(t0)
E(X) = t0
0
•X
•R(X)
•Xt0
gt0
g′t0
Cone(t0)
X ∈ Cone(t0), then R(X) = X.
21
Sketch of the proof (finite group = more visual proof)
For G finite, R(X) registration of X = t0 + σε to t0.
Gradient of the variance: ∇F(t0) = 2 (E(X)− E(R(X)))
0 t0
gt0
g′t0
Cone(t0)
E(X) = t0
gt0
g′t0
Z0 t0
Cone(t0)
Graphic representation of Z = E(R(X)).The part in grid-line = folded points.
∇F(t0) 6= 0 Inconsistency21
Sketch of the proof (finite or infinite group)
When the group is not finite, differentiate the variance.Two possible methods to show inconsistency:
• Find argmin F, and see if t0 ∈ argmin F : difficult issue.
• Find a point x such has F(x) < F(t0):
We found a point λt0 with F(λt0) < F(t0) Inconsistent.Be careful, a priori [λt0] is not a Fréchet mean of [Y].
22
How often is fulfilled this condition with the cone?
A group G acts isometrically on a Hilbert space. [t0] a manifold,Tt0 [t0] the affine tangent space of [t0] at t0.Tt0 [t0]⊥ the normal space of [t0] at t0.
Proposition: being inconsistent for smooth orbits.
P(ε /∈ Tt0 [t0]⊥) > 0 =⇒ inconsistency
[t0]
Tt0 [t0]⊥
Tt0 [t0]
g · t0
0t0
y
y /∈ Tt0 [t0]⊥ therefore y is closer from g · t0 for some g ∈ G than t0 itself. In
conclusion, y in the support of X = t0 + σε inconsistency.
23
Table of Contents
Introduction
Part I: Inconsistency for Isometric Action
a) Interpretation of the Max-Max Algorithm with the Fréchet Mean inQuotient Spaces
b) Proving the Inconsistency for Isometric Action
c) Quantification of Consistency Bias for Isometric Action
Part II: Inconsistency for Non isometric Action
Conclusion
24
Consistency bias when the noise level tends to infinity
Definition of consistency bias
Consistency bias (CB) : distance between the template t0 and argmin F.
Definition of fixed points
A fixed point m ∈ M : for all g ∈ G, g · m = m.
Proposition: consistency bias is asymptotically linear when σ → +∞G acts isometrically on a Hilbert space. We take Y = Φ · t0 + σε.If support of the noise ε is not included in the set of fixed points then:
CB = σK + o(σ) as σ → +∞, where K = sup‖v‖=1
E
(supg∈G〈v, g · ε〉
)> 0.
Moreover, limt0→0
CB = σK.
25
Consistency bias when the noise level tends to infinity
Definition of consistency bias
Consistency bias (CB) : distance between the template t0 and argmin F.
Definition of fixed points
A fixed point m ∈ M : for all g ∈ G, g · m = m.
Proposition: consistency bias is asymptotically linear when σ → +∞G acts isometrically on a Hilbert space. We take Y = Φ · t0 + σε.If support of the noise ε is not included in the set of fixed points then:
CB = σK + o(σ) as σ → +∞, where K = sup‖v‖=1
E
(supg∈G〈v, g · ε〉
)> 0.
Moreover, limt0→0
CB = σK.
25
Sketch of the proof
F(m) = E(
infg∈G‖m− g · Y‖2
)where Y =Φ · t0 +σε.
• Minimization of F(λv) w.r.t. λ ≥ 0, ‖v‖ = 1. Then m? ∈ argmin F
‖m?‖ = sup‖v‖=1
E
(supg∈G〈v, g · Y〉
)
= sup‖v‖=1
E
(supg∈G
(〈v, gΦt0〉+ 〈v, σgε〉)
)Difficult (impossible?) to compute.
• Cauchy-Schwarz inequality:
−‖t0‖+ σK ≤ ‖m?‖ ≤ ‖t0‖+ σK
• By triangular inequality:
−2‖t0‖+ σK ≤ ‖m? − t0‖ ≤ σK + 2‖t0‖
K > 0 (because the support ε is not included in the set of fixed points).
26
Sketch of the proof
F(m) = E(
infg∈G‖m− g · Y‖2
)where Y =Φ · t0 +σε.
• Minimization of F(λv) w.r.t. λ ≥ 0, ‖v‖ = 1. Then m? ∈ argmin F
‖m?‖ = sup‖v‖=1
E
(supg∈G〈v, g · Y〉
)= sup‖v‖=1
E
(supg∈G
(〈v, gΦt0〉+ 〈v, σgε〉)
)Difficult (impossible?) to compute.
• Cauchy-Schwarz inequality:
−‖t0‖+ σK ≤ ‖m?‖ ≤ ‖t0‖+ σK
• By triangular inequality:
−2‖t0‖+ σK ≤ ‖m? − t0‖ ≤ σK + 2‖t0‖
K > 0 (because the support ε is not included in the set of fixed points).
26
Sketch of the proof
F(m) = E(
infg∈G‖m− g · Y‖2
)where Y =Φ · t0 +σε.
• Minimization of F(λv) w.r.t. λ ≥ 0, ‖v‖ = 1. Then m? ∈ argmin F
‖m?‖ = sup‖v‖=1
E
(supg∈G〈v, g · Y〉
)= sup‖v‖=1
E
(supg∈G
(〈v, gΦt0〉+ 〈v, σgε〉)
)Difficult (impossible?) to compute.
• Cauchy-Schwarz inequality:
−‖t0‖+ σK ≤ ‖m?‖ ≤ ‖t0‖+ σK
• By triangular inequality:
−2‖t0‖+ σK ≤ ‖m? − t0‖ ≤ σK + 2‖t0‖
K > 0 (because the support ε is not included in the set of fixed points).
26
Sketch of the proof
F(m) = E(
infg∈G‖m− g · Y‖2
)where Y =Φ · t0 +σε.
• Minimization of F(λv) w.r.t. λ ≥ 0, ‖v‖ = 1. Then m? ∈ argmin F
‖m?‖ = sup‖v‖=1
E
(supg∈G〈v, g · Y〉
)= sup‖v‖=1
E
(supg∈G
(〈v, gΦt0〉+ 〈v, σgε〉)
)Difficult (impossible?) to compute.
• Cauchy-Schwarz inequality:
−‖t0‖+ σK ≤ ‖m?‖ ≤ ‖t0‖+ σK
• By triangular inequality:
−2‖t0‖+ σK ≤ ‖m? − t0‖ ≤ σK + 2‖t0‖
K > 0 (because the support ε is not included in the set of fixed points).
26
Table of Contents
Introduction
Part I: Inconsistency for Isometric Action
Part II: Inconsistency for Non isometric Action
a) Inconsistency for Invariant Distance
b) Inconsistency for Non Invariant Distance
Conclusion
27
Variation of the Isotropy Group Due to the Noise
Definition: Isotropy Group (or Stabilizer)
Iso(m) = {g ∈ G, s.t. g · m = m}
Example: Reparametrization of functions
ϕ : [0, 1]→ [0, 1] homeomorphism, f : [0, 1]→ R (ϕ, f) 7→ ϕ · f = f ◦ ϕ
t0 constant map on D = [0.2, 0.8]
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5t0
Iso(t0) = {ϕ | ϕ|Dc = Id} ! {Id}
t0 + noise
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5t0+noise
Iso(t0 + noise) = {Id}
28
Variation of the Isotropy Group Due to the Noise
Definition: Isotropy Group (or Stabilizer)
Iso(m) = {g ∈ G, s.t. g · m = m}
Example: Reparametrization of functions
ϕ : [0, 1]→ [0, 1] homeomorphism, f : [0, 1]→ R (ϕ, f) 7→ ϕ · f = f ◦ ϕ
t0 constant map on D = [0.2, 0.8]
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5t0
Iso(t0) = {ϕ | ϕ|Dc = Id} ! {Id}
t0 + noise
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5t0+noise
Iso(t0 + noise) = {Id}28
Stability Theorem Implies Inconsistency
Stability Theorem in Hilbert spaces
G a compact group acting continuously on M a Hilbert space, dM isinvariant. Observable variable Y in M. If
P(Iso(Y) = {eG}) > 0 eG : neutral element in G.
m? ∈ argminm∈M
F(m) = argminm∈M
E(
infg∈G
dM(m, g · Y)2
).
If R(Y) is a measurable variable registering Y to m?, then:
Iso(m?) = {eG}.
Implies Inconsistency if Iso(t0) 6= {eG}.Stability Theorem also true in complete finite dimensional Riemannianmanifolds and proof of the measurable variable R(Y) [Huckemann 2012].
29
Table of Contents
Introduction
Part I: Inconsistency for Isometric Action
Part II: Inconsistency for Non isometric Action
a) Inconsistency for Invariant Distance
b) Inconsistency for Non Invariant Distance
Conclusion
30
Non Invariant Distance
Non invariant distance used in applications:
Reparametrization by a diffeomorphism ϕ
fi : Rd → R images d = 2 or signals d = 1: ‖f1 ◦ϕ− f2 ◦ϕ‖2 6= ‖f1− f2‖2.
G acting on a Hilbert space:A priori, possibility to define a distance in the quotient space.For Y = Φ · t0 + σε. minimizing F(m) = E( inf
g∈G‖Y − g · m‖2): still possible.
31
Non Invariant Distance
Non invariant distance used in applications:
Reparametrization by a diffeomorphism ϕ
fi : Rd → R images d = 2 or signals d = 1: ‖f1 ◦ϕ− f2 ◦ϕ‖2 6= ‖f1− f2‖2.
G acting on a Hilbert space:A priori, possibility to define a distance in the quotient space.For Y = Φ · t0 + σε. minimizing F(m) = E( inf
g∈G‖Y − g · m‖2): still possible.
31
How to deal with non isometric action?
Isometric
σ•t0•0
Orbit of the template, in gray the noise.
We can find a point λt0 such thatF(λt0) < F(t0).
General Action with Bounded Orbit
•t0•0
σ
Bounded orbit of the template, in graythe noise.
32
How to deal with non isometric action?
Isometric
σ•t0•0
Orbit of the template, in gray the noise.
We can find a point λt0 such thatF(λt0) < F(t0).
General Action with Bounded Orbit
•t0•0
σ
Bounded orbit of the template, in graythe noise.
32
How to deal with non isometric action?
Isometric
σ•0
Orbit of the template, in gray the noise.
We can find a point λt0 such thatF(λt0) < F(t0).
General Action with Bounded Orbit
σ•0
Bounded orbit of the template, in graythe noise.
So why not in this case?
32
Inconsistency for non invariant distance
Inconsistency: a subgroup of G acts isometrically
A group G acting on a Hilbert space, [t0] is bounded. We note:
θ(G) =1‖t0‖
E
(supg∈G〈g · t0, ε〉
)
If H a subgroup of G, H acts isometrically and θ(H) > 0,then inconsistency for σ > σc = f([t0], θ(G), θ(H), t0) for a certainpositive function f .
Example
G = group of diffeomorphisms, H = rotations.
33
Inconsistency for non invariant distance
Inconsistency: a subgroup of G acts isometrically
A group G acting on a Hilbert space, [t0] is bounded. We note:
θ(G) =1‖t0‖
E
(supg∈G〈g · t0, ε〉
)
If H a subgroup of G, H acts isometrically and θ(H) > 0,then inconsistency for σ > σc = f([t0], θ(G), θ(H), t0) for a certainpositive function f .
Example
G = group of diffeomorphisms, H = rotations.
33
Inconsistency for non invariant distance
Inconsistency for G acting linearly + Regularization
A group G acting linearly on a Hilbert space, [t0] is bounded. We note:
θ(G) =1‖t0‖
E
(supg∈G〈g · t0, ε〉
).
The template estimation is performed by minimizing
F(m) = E(
infg∈G‖g · m− Y‖2 + Regularization(g)
),
where Regularization is bounded. If θ(G) > 0 then Inconsistency forσ > σc = f([t0], θ(G), t0) for a certain positive function f .
Action of reparametrization of functions
ϕ a diffeo (ϕ, f) 7→ f ◦ ϕ linear action.Proof: (af1 + f2) ◦ ϕ = af1 ◦ ϕ + f2 ◦ ϕ.
34
Inconsistency for non invariant distance
Inconsistency for G acting linearly + Regularization
A group G acting linearly on a Hilbert space, [t0] is bounded. We note:
θ(G) =1‖t0‖
E
(supg∈G〈g · t0, ε〉
).
The template estimation is performed by minimizing
F(m) = E(
infg∈G‖g · m− Y‖2 + Regularization(g)
),
where Regularization is bounded. If θ(G) > 0 then Inconsistency forσ > σc = f([t0], θ(G), t0) for a certain positive function f .
Action of reparametrization of functions
ϕ a diffeo (ϕ, f) 7→ f ◦ ϕ linear action.Proof: (af1 + f2) ◦ ϕ = af1 ◦ ϕ + f2 ◦ ϕ.
34
Table of contents
Introduction
Part I: Inconsistency for Isometric Action
Part II: Inconsistency for Non isometric Action
Conclusion
35
Summary of contributions
• It is proved that the template estimation with the Fréchet mean inquotient space is not consistent for isometric action.
• It is possible to quantify the consistency bias for σ → +∞.
• We proved a stability theorem which implies the inconsistency inHilbert Space for invariant distance.
• The inconsistency can also be proved for not isometric action, but onlyfor σ high enough.
This work has been presented in a workshsop (MFCA 2015), published in aconference (IPMI 2017) and in two journal papers (SIIMS 2017 and Entropy2017).
36
Summary of contributions
• It is proved that the template estimation with the Fréchet mean inquotient space is not consistent for isometric action.
• It is possible to quantify the consistency bias for σ → +∞.
• We proved a stability theorem which implies the inconsistency inHilbert Space for invariant distance.
• The inconsistency can also be proved for not isometric action, but onlyfor σ high enough.
This work has been presented in a workshsop (MFCA 2015), published in aconference (IPMI 2017) and in two journal papers (SIIMS 2017 and Entropy2017).
36
What are the possible extensions?
• Extending the existence of the measurable variable which registers datato a certain point.
• Proving the inconsistency for non invariant distance for all σ.
• Provide an asymptotic behaviour of the consistency bias when σ → 0.
37
Thank you for your attention!Any questions?
37
Example 2: action of diffeomorphisms on functions
0 0.2 0.4 0.6 0.8 1-2
-1
0
1
2
Template: t0
Example 2: action of diffeomorphisms on functions
0 0.2 0.4 0.6 0.8 1-2
-1
0
1
2
Deformed template: t0 ◦ ϕ
SRVF: The norm of f√|f|
is invariant under the action of ϕ commonly used.
Example 2: action of diffeomorphisms on functions
0 0.2 0.4 0.6 0.8 1-2
-1
0
1
2
Template and deformed template added to noise: t0 ◦ ϕ+ ε
Example 3: Consistency and smoothness
Example of translated functions: sample size 106 of discretized functionswith 64 points, σ = 10.
0 0.2 0.4 0.6 0.8 1-0.5
0
0.5
1
1.5templatemax max output
Example 4: Local minima
0.4 0.45 0.5 0.55 0.6
1
1.2
1.4