€¦ · contents 1 introduction: why do we need to compute on manifolds? 7 1.1 computational...

101
From Riemannian Geometry to Computational Anatomy X. Pennec Asclepios project-team, INRIA Sophia-Antipolis Mediterranee October 17, 2011

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

From Riemannian Geometry to Computational Anatomy

X. PennecAsclepios project-team, INRIA Sophia-Antipolis Mediterranee

October 17, 2011

Page 2: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

2

Page 3: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Contents

1 Introduction: Why do we need to compute on manifolds? 7

1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Shapes, forms and deformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Problems with the processing of manifold data . . . . . . . . . . . . . . . . . . . . . 9

1.4 Statistical analysis on manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 A short introduction on differential and Riemannian geometry 13

2.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Submanifolds M of dimension n in Rn+k . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Chart = local parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.3 Abstract manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.4 Tangent vectors and tangent space . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.5 Geometric structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Riemannian metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.2 Length of a curve and Riemannian distance . . . . . . . . . . . . . . . . . . . 15

2.2.3 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.4 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.5 exponential map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.6 cut locus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Elements of analysis in Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . 19

2.3.1 partial derivatives and gradient of a function . . . . . . . . . . . . . . . . . . 19

2.3.2 Hessian, covariant derivative and connection . . . . . . . . . . . . . . . . . . . 19

2.3.3 Riemannian measure or volume form . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 Practical implementation of algorithms on manifolds . . . . . . . . . . . . . . . . . . 20

2.5 Lie groups and Homogeneous manifolds . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5.1 One parameter subgroups and group exponential . . . . . . . . . . . . . . . . 21

2.5.2 Left and right invariant metrics on Lie groups . . . . . . . . . . . . . . . . . . 22

2.5.3 Homogeneous manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6 Additional references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.1 References on Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 3D Rotations matrices 27

3.1 Vector rotations of R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.2 From matrices to rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.1.3 Geometric parameters: axis and angle . . . . . . . . . . . . . . . . . . . . . . 29

3

Page 4: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

4 CONTENTS

3.1.4 Uniform random rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.5 Differential properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.1.6 A bi-invariant metric on SO3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1.7 Group exponential and one-parameter subgroups . . . . . . . . . . . . . . . . 36

3.1.8 Metric properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2 Additional literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4 Simple statistics on Riemannian manifolds 39

4.0.1 First statistical moment: the mean . . . . . . . . . . . . . . . . . . . . . . . . 39

4.0.2 A Newton algorithm to compute the mean . . . . . . . . . . . . . . . . . . . 41

4.0.3 Covariance matrix and Principal Geodesic Analysis . . . . . . . . . . . . . . . 41

4.0.4 Robust statistics using ϕ-connectors . . . . . . . . . . . . . . . . . . . . . . . 42

4.0.5 Gaussian and χ2 law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 The Example of Covariance matrices 47

5.1 Exponential, logarithm and square root of tensors . . . . . . . . . . . . . . . . . . . 48

5.1.1 Differential of the exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.1.2 Differential of the logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.1.3 Remarkable identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.2 Affine invariant metrics on the SPD space . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2.1 Looking for an affine invariant distance . . . . . . . . . . . . . . . . . . . . . 51

5.2.2 An invariant Riemannian metric . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2.3 Simple statistical operations on tensors . . . . . . . . . . . . . . . . . . . . . 54

5.3 The one-parameter family of affine-invariant metrics . . . . . . . . . . . . . . . . . . 55

5.4 Log-Euclidean metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.5 Other families of metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.5.1 Which metric for which problem? . . . . . . . . . . . . . . . . . . . . . . . . . 59

6 Computing with Manifold-valued images 61

6.1 Interpolation and filtering as weighted means . . . . . . . . . . . . . . . . . . . . . . 61

6.1.1 Gaussian and kernel-based filtering . . . . . . . . . . . . . . . . . . . . . . . . 62

6.2 Harmonic diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.3 Anisotropic diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.4 Diffusion-based interpolation and extrapolation . . . . . . . . . . . . . . . . . . . . . 64

7 Applications in Computational Anatomy 65

7.1 Diffusion tensor imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.2 Learning Brain Variability from Sulcal Lines . . . . . . . . . . . . . . . . . . . . . . . 67

7.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

A Appendices 73

A.1 Jacobians of usual scalar and vector functions . . . . . . . . . . . . . . . . . . . . . . 73

A.1.1 Composition and multiplication of functions . . . . . . . . . . . . . . . . . . . 73

A.1.2 Function of two variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

A.1.3 Jacobian of standard functions . . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.1.4 Operators and functions particular to R3 . . . . . . . . . . . . . . . . . . . . 74

Page 5: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

CONTENTS 5

B Solution to exercises 77B.1 Exercices on 3D rotation matrices of chapter 3 . . . . . . . . . . . . . . . . . . . . . 77B.2 Exercices on tensors of chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Page 6: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

6 CONTENTS

Page 7: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Chapter 1

Introduction: Why do we need tocompute on manifolds?

1.1 Computational anatomy

Anatomy is the science that studies the structure and the relationship in space of different organsand tissues in living systems. Before the renaissance, anatomical descriptions were mainly based onanimal models and the physiology was more philosophical than scientific. Modern anatomy reallybegan with the authorized dissection of human cadavers, giving birth to the ”De humani corporisfabrica” published by in 1543 by Vesale (1514-1564), and was strongly pushed by the progressesin surgery, as exemplified by the ”Universal anatomy of the human body” (1561-62) of the greatsurgeon Ambroise Pare (1509-1590). During the following centuries, many progresses were done inanatomy thanks to new observation tools like microscopy and histology, going down to the level ofcells in the 19th and 20th centuries.

However, Since the 1980ies, an ever growing number of imaging modalities allows observingboth the anatomy and the function in vivo and in situ at many spatial scales (from cells to thewhole body) and at multiple time scales: milliseconds (e.g. beating heart), years (growth oraging), or even ages (evolution of species). Moreover, the non-invasive aspect allows repeating theobservations on multiple subjects. This has a strong impact on the goals of the anatomy which arechanging from the description of a representative individual to the description of the structure andorganization of organs at the population level. This led in the last 10 to 20 years to the gradualevolution of descriptive atlases into interactive and generative models, allowing the simulation ofnew observations. Typical examples are given for the brain by the MNI 305 [ECM+93] and ICBM152 [MTE+01] templates that are the basis of the Brain Web MRI simulation engine [CZK+98]. Inthe orthopedic domain, one may cite the ”bone morphing” method [FL98, RJS04] that allows tosimulate the shape of bones.

The combination of these new observation means and of the computerized methods is at theheart of computational anatomy, an emerging discipline at the interface of geometry, statistics andimage analysis which aims at developing algorithms to model and analyze the biological shape oftissues and organs. The goal is to estimate representative organ anatomies across diseases, popula-tions, species or ages, to model the organ development across time (growth or aging), to establishtheir variability, and to correlate this variability information with other functional, genetic or struc-tural information (e.g. fiber bundles extracted from diffusion tensor images). From an applicativepoint of view, a first objective is to understand and to model how life is functioning at the popu-lation level, for instance by classifying pathologies from structural deviations (taxonomy) and by

7

Page 8: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

8 CHAPTER 1. INTRODUCTION: WHY DO WE NEED TO COMPUTE ON MANIFOLDS?

Figure 1.1: Top: Evolution of the representation of the anatomy of the human squeletton. Fromleft to right: Brunschwig (1497), Despepars (1500), Albucassis (1532), Vesale (1546) Pare (1562).Bottom : atlas of functional regions of the head by Hundt (1501) on the left; Atlas of the brain byVesale (1543) on the middle, and description of the brain by Pare (1562).

integrating individual measures at the population level (spatial normalization) to relate anatomyand function. A second application objective is to provide better quantitative and objective mea-sures to detect, understand and correct dysfunctions at the individual level in order to help therapyplanning (before), control (during) and follow-up (after).

The method is generally to map some generic (atlas-based) knowledge to patients-specific datathrough atlas-patient registration. In the case of observations of the same subject, many geomet-rical and physically based registration methods were proposed to faithfully model and recover thedeformations. However, in the case of different subjects, the absence of physical models relatingthe anatomies leads to a reliance on statistics to learn the geometrical relationship from manyobservations. This is usually done by identifying anatomically representative geometric features(points, tensors, curves, surfaces, volume transformations), and then modeling their statistical dis-tribution across the population, for instance via a mean shape and covariance structure analysisafter a group-wise matching. In the case of the brain for instance, one can rely on a hierarchyof structural models ranging from anatomical or functional landmarks like the AC and PC points[TT88, Boo78], curves like crest lines [STA98] or sulcal lines [MRC+04a, LGPC+99, FAP+07], sur-faces like the cortical surface or sulcal ribbons [TMM+77, AKM+01, VQGM07], images seen as3D functions, which lead to voxel-based morphometry (VBM) [AF00], diffusion imaging or rigid,multi-affine or diffeomorphic transformations [Tro98, MTY03, ACPA06a], leading to Tensor-basedmorphometry (TBM).

1.2 Shapes, forms and deformations

Identifying and extracting the anatomical features that we want to analyse is the first step whichdefines the object space. However, one usually consider the global pose of our objects as a nuisance

Page 9: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

1.3. PROBLEMS WITH THE PROCESSING OF MANIFOLD DATA 9

factor related to the arbitrary coordinate system of the acquisition device: generally speaking theshape of an object is understood as the geometric information that is invariant under translation,rotations and rescaling. Thus, one is often interested only in the shape, i.e. what remains if we quo-tient out the object space by a group action, usually rigid body, similary or affine transformations.For instance, when we consider the equivalent classes of sets of k (labeled or unlabeled) pointsunder the action of a global similarity (resp. rigid body) transformation, we obtain the famousshape (resp. size and shape) spaces of kendall at al [Ken89, DM91, LK93, Sma96]. One can definesimilarly the shape spaces of curves by removing the effect of reparameterizations and the globalpose of the curve [JKSM06, MSJ07, JKSJ07b, JKSJ07a]. As a result, it is important to notice thatthe resulting quotient space is most often non-linear, which precludes the use of classical linearstatistics (a simple example is given by the sphere Sn which can be viewed as the quotient spaceof Rn∗ by a global scalar scaling factor).

Assuming that the global pose of our objects is normalized in some way, an alternative modelingof shapes was proposed by D’Arcy Thompson in 1917 [DT17] and pushed more recently to astandard analysis tool by Grnaander and Miller [Gre93, MY01]. The idea is to assume that threis a template object, which somehow represents the anatomy of reference (this can be seen as anatlas or an equivalent of the mean value), and to analyze the variability of the shape through thedeformations of this reference object towards the actual observations: a shape difference is encodedby the transformation that de-form. One key feature of this lift of the shape characteristics fromthe object space to the transformation space is that it allows to apply the typical deformations toother objects than the ones analyzed in the first place, provided that they also live in the coordinatesystem of the template. For instance, [?] analyzed the variability of curves and deforms accordinglythe surface of the cortex or the full 3D volume of the brain. However, the problem is even morecomplex than previously as we want here to perform statistics on large deformations, which arenotably known to belong to non-linear diffeomorphic spaces.

1.3 Problems with the processing of manifold data

Processing geometric features is far more difficult than processing points, and a number of paradoxescan arise. For instance, we have shown in [PT97] that additive noise was not well suited fordescribing the uncertainty of frames and could be advantageously replaced by a “compositive” modelof noise. We investigated in [PA98] three other basic problems that often arise when processinggeometric features or in a statistical setting.

The first one is the quantification of the probability of occurrence of an event when somegeometric features are randomly distributed. Bertrand’s paradox [Poi12, KM63] illustrate the factthat different notions of uniformity lead to different statistical results. The solution is to requirethe invariance of the uniform distribution under the action of a given group of transformation.For instance, in Fig. 1.2, the three solutions are invariant by rotation, but only the second one isinvariant by the full rigid body transformations group.

A similar type of paradox can be established for “distances” between features, which is one of themostly used operations in image processing algorithms. Measuring the Euclidean distance betweenparameters in different charts (or representations) obviously leads to different results. Consideringa true distance on the manifold (a symmetric positive definite bivariate function with the triangularinequality) allows to be independent of the chart, but is not sufficient to ensure the stability ofthe results with respect to a change of the reference frame of the physical space (the action of atransformation on our features). Once again, the solution we suggested in [PA98] was to choosea distance which is invariant under the action of a given group of transformation. Under some

Page 10: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

10 CHAPTER 1. INTRODUCTION: WHY DO WE NEED TO COMPUTE ON MANIFOLDS?

Figure 1.2: Bertrand’s Paradox (1907): three methods for computing the probability that a“random” chord of a circle has a length greater than the side of an inscribed equilateral triangle.From left to right, the methods give a probability of 1/3, 1/2 and 1/4. The three solutions are correctbut they do not refer to the same notion of uniformity in the way we choose the chord. Method1: A chord intersects the circle in two points that we assume to be equally and independentlydistributed on the circle. If the first point is A, the second point has to lie between A′ and A′′ inthe circle for the chord to be greater than the triangle side. This is just 1/3 of the circumferenceand the searched probability is then 1/3. Method 2: A chord is characterized by its distance p tothe center (between 0 and 1) and its orientation θ w.r.t. a fixed line (between 0 and 2π). Drawingan inscribed equilateral triangle parallel to the chord, we can see that the distance d has to be lessthan 1/2 in order to have a chord length greater than

√3. By assuming a uniform orientation and

distance to the origin, we find a normalized probability of 1/2. Method 3: A chord is uniquelydefined by the orthogonal projection I of the circle center onto it. It has to lie inside the disc ofradius 1/2 in order to have a sufficient length. Assuming that I is uniformly distributed over theinterior of the circle, the normalized probability is 1/4.

specific conditions, such an invariant distance can be induced on the manifold by a left invariantdistance on the transformation group.

Last but not least, let us investigate the notion of a mean feature. For 3D rotations, we canfor instance compute the mean rotation matrix R = 1

n

∑iRi, the mean quaternion q = 1

n

∑i qi by

using the unit quaternion representation, or the mean rotation vector r = 1n

∑i ri. These three

methods give different results: the two first are not even rotations and none of them is stable withrespect to a change of reference frame. Likewise, the average of points on a sphere is located insidethe sphere and not on its surface.

The definition of the mean should of course ensures that it belongs to the manifold and thatit is be independent of the chart used to represent the manifold. The main problem here is thatthe addition and multiplication by a scalar are operators that are only defined in a vector space(for instance in each chart). As they are not intrinsic to our manifold, their result will inevitablychange with the chart used. Similarly, we cannot generalize the standard statistical expectationoperator x =

∫x dP (x) to manifolds since an integral is a linear operator, and there is no such

things as linearity in manifolds. Thus, a new definition of mean is needed, which implies revisiting

Page 11: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

1.4. STATISTICAL ANALYSIS ON MANIFOLDS 11

an important part of the theory of statistics.

1.4 Statistical analysis on manifolds

Statistical computing on simple manifolds like the 3D sphere or a flat torus (for instance an imagewith opposite boundary points identified) might seems easy as we can see the geometrical properties(e.g. invariance by rotation or translation) and imagine tricks to alleviate the different problems.However, when it comes to slightly more complex manifolds like the space of positive definitematrices or the space of rigid-body motions (rotations and translations), without even thinkingto infinite dimensional manifolds like spaces of curves, surfaces or diffeomorphisms, computationaltricks are much more difficult to find and have to be determined on a case by case basis. Thus oneshould bettre find a generic way to design algorithm to compute and to do statistics on manifolds.

Statistical analysis on manifolds is a relatively new domain at the confluent of several mathemat-ical and application domains. Its goal is to study statistically geometric object living in differentialmanifolds. Directional statistics [Bin74, JM89, Ken92, Mar99] provide a first approach to statisticson manifold. As the manifolds considered here are spheres and projective spaces, the tools devel-oped were mostly extrinsic, i.e. relying on the embedding of the manifold in the ambient Euclideanspace. More complex objects are obtained when we consider the “shape” of a set of k points, i.e.the part that remains invariant under the action of a given group of transformations (usually igidbody ones or similarities). The statistics on these shape spaces [Ken89, DM91, LK93, Sma96] raisedthe need for intrinsic tools. However, the link between the tools developed, the metric used andthe space structure was not always very clear.

Another mathematical approach was provided by the study of stochastic processes on Lie groups.For instance, [Gre63] derived central limit theorems on different families of groups and semi-groupswith specific algebraic properties. Since then, several authors in the area of stochastic differentialgeometry and stochastic calculus on manifolds proposed results related to mean values [Kar77,Ken91, EM91, Arn95, Pic94, Dar96].

In the area of numerical methods for dynamic systems and partial differential equations, quite afew interesting numerical methods were developed to preserve the geometric properties of the flowof a differential equation such as symplectic integrator for Hamiltonian systems, symmetric integra-tors for reversible systems and optimization methods on Lie groups and manifolds [HLW02, HM94].In particular, several Newton iteration schemes relying on different structures were proposed to op-timize a function on a matrix Lie group or on Riemannian manifolds [OW00, MM02, DMP03].From the applied mathematics and computer science point of view, people get interested incomputing and optimizing on specific manifolds, like rotations and rigid body transformations[PGT98, GMS98, Pen98, Gra01, Moa02], Stiefel and Grassmann manifolds [EAS98].

Over the last years, several groups attempted to federate some of the above approaches in ageneral statistical and computing framework, with different objectives in mind. For instance, theaim of the theory of statistical manifolds [Ama90, OC95] is to provide a Riemannian structureto the space of parameters of statistical distribution. This evolved into the more general theoryof information geometry [iAN00, KV97]. Seen from the point of view of statistics of manifoldsrather than manifolds of statistical parameters, a few authors characterized the performances ofsome statistical parametric estimators in manifolds like the bias and the mean square error. Forinstance, [Hen91] considered extrinsic statistics, based on the Euclidean distance of the embeddingspace, while [OC95] considered the intrinsic Riemannian distance, and refined the Cramer-Raolower bound using bounds on the sectional curvature of the manifold. In [BP02, BP03, BP05], theauthors focused on the asymptotic consistency properties of the extrinsic and intrinsic means and

Page 12: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

12 CHAPTER 1. INTRODUCTION: WHY DO WE NEED TO COMPUTE ON MANIFOLDS?

variances for large sample sizes, and were able to propose a central limit theorem for flat manifolds.In view of computer vision and medical image analysis applications, our concern in [Pen99,

Pen06a] was quite different: we aimed at developing computational tools that can consistentlydeal with geometric features, or that provide at least good approximations. As we often have fewmeasurements, we were interested in small sample sizes rather than large one, and we preferred toobtain approximations rather than bounds on the quality of the estimation. Thus, a special interestwas to develop Taylor expansions with respect to the variance, in order to evaluate the quality ofthe computations with respect to the curvature of the manifold.

In [OC95, BP02, BP03, BP05] as well as in our work, the chosen framework is the one ofgeodesically complete Riemannian manifolds, which appears to be powerful enough to support aninteresting theory. To ensure a maximal consistency, we chose to rely only on intrinsic propertiesof the Riemannian manifold, thus excluding methods based on the embedding of the manifold inan ambient Euclidean space.

5.5.1 will focus on how to choose the metric for the important classes of Lie groups and homo-geneous manifolds, with a practical example on 3D rotations.

Page 13: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Chapter 2

A short introduction on differentialand Riemannian geometry

The goal of this chapter is to establish the mathematical bases that will allow in the sequel tobuild a simple but consistent statistical computing framework on manifolds. We describe a fewcomputational tools (namely the Riemannian Exp and Log maps) derived from a chosen Riemannianmetric on a given manifold. The implementation of these atomic tools will then constitute the basisto build more complex generic algorithms in the following chapters. The interested reader may referto the very clear and concise book of do Carmo [dC92] for a more complete but still affordablepresentation of Riemannian geometry and to [Spi79, chap. 9] and [Kli82, GHL93] for more details.

2.1 Manifolds

A manifold is a collection of points that locally looks like a Eucliean space (say Rn). There aremainly two ways to look at manifolds: either as a submanifold embeded in a larger Euclidean space(in which case it usually inherits from the structure of the embedding space), or intrinsically usinglocal charts and atlases (abstract manifolds), in which case one has to specify the structure locally,and to maintain its consistency accross charts in their overlapping part. These two ways of seingmanifolds are complementary, on one or the other can be better suited to undertand or implementsome ideas.

For instance, we are (generally) constrained by the gravity to live on the surface of the earth,which has 2 dimensions only and we use 2D maps to navigate in a given area, whith the sometimesthe need to change from one map to another using their overlaping part to go farther away. Fora long time, the earth was even considered to be flat because its curvature was not noticable atthe scale at which it was observed, before the spherical shape embedded in a 3D was generallyaccepted. Notice that although the earth has a globally spherical shape, it is actually closer to anellipsoid than to a sphere.

2.1.1 Submanifolds M of dimension n in Rn+k

Implicit definition through smooth non degenerated constraints of constant dimension k (Level setϕ(x) = 0).

The manifold is said to be topological (C0), differential (C1) or smooth C∞ if the function ϕ iscontinuous, differentiable or smooth.

13

Page 14: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

14CHAPTER 2. A SHORT INTRODUCTIONONDIFFERENTIAL AND RIEMANNIANGEOMETRY

• Sphere Sn in Rn+1 : Sn = p ∈ Rn+ 1/‖p2‖ = 1.

• Flat and curved tori.

• Real Projective space Pn

• Orthogonal matrices On = R ∈Mnxn/R.RT = Idn.

2.1.2 Chart = local parameterization

Diffeomorphic mapping from an open set U ∈ Rn to M. From the implicit function theorem, itexists locally at each point of a submanifold.

2.1.3 Abstract manifold

Atlas = a family of charts (xi, Ui) that

• Cover M

• are compatible, i.e. such that x−1i x is Cp at all common points of their definition domain.

The manifold is said to be topological (p = 0), differential (p = 1) or smooth p =∞.

• Sphere: stereographic projection (cf Gallot p.8)

• Projective spaces (link with rotations through unit quaternions)

A submanifold is an abstract manifold. Whitney embedding theorem: any n-dimensional man-ifold can be embedded in Rk with k ≤ 2n+ 1.

2.1.4 Tangent vectors and tangent space

On submanifolds : like usual. On abstract manifolds: equivalent classes of curves that are tangentat a given given point, i.e. that have the same time derivative in a chart at that point. The tangentspace TpM is the space of all tangent vectors at point p.

Directional derivative:

• Curve γ(ε t) = p+ εv +O(ε2) (with x ∈M and v ∈ TxM).

• Observable or test function f (from M to R): f γ(ε t) = f(p) + ε∂vf +O(ε2).

∂vf =∂(f γ)

∂t

∂v is a tangent vector, i.e. an application that maps to each differentiable function it derivativewhen composed with a curve that has this tangent vector.

A local coordinate system x = (x1, . . . xn) induces a basis ∂∂x = (∂1, . . . ∂n) of the tangent spaces

(∂i is a shorter notation for ∂/∂xi) in such as way that ∂v =∑

i vi∂i.

Page 15: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

2.2. RIEMANNIAN MANIFOLDS 15

2.1.5 Geometric structures

In the geometric framework, one has to separate the topological and differential properties of themanifold from the geometric and metric ones. The first ones determine the local structure of amanifold M by specifying neighboring points and tangent vectors, which allows us to differen-tiate smooth functions on the manifold. This also allows us to define continuous paths on themanifold and to classify them by the number of loops they are doing around ”holes” in the man-ifold. However, within each of these homotopy classes, there is no tool to choose something likethe ”straightest path”. To obtain such a notion, we need to add a geometric structure, called aconnection, which allows to compare neighboring tangent spaces and characterizes the parallelismof vectors at different points. Indeed, differentiating a path on a manifold gives tangent vectorsbelonging at each point to a different tangent vector space. In order to compute the second orderderivative (the acceleration of the path), we need a way to map the tangent space at a point tothe tangent space at any neighboring point. This is the goal of a connection ∇XY , which specifieshow the vector field Y (p) is derived in the direction of the vector field X(p). Such a connectionoperator also describes how a vector is transported from a tangent space to another along a givencurve and specifies the local parallel transport. However, there is usually no global parallelism.As a matter of facts, transporting the same vector along two different curves arriving at the samepoint might lead to different ending vectors: this is easily seen on the sphere where traveling fromnorth pole to the equator, then along the equator for 90 degrees and back to North pole turns anytangent vector by 90 degrees. This defect of global parallelism is the sign of curvature. By lookingfor curves that remains locally parallel to themselves (i.e. such that ∇γ γ = 0), one defines theequivalent of ”straight lines” in the manifold: geodesics. One should notice that there exists manydifferent choices of connections on a given manifold which lead to different geodesics.

However, geodesics by themselves do not quantify how far away from each other two points are.For that purpose, we need an additional structure: a distance. By restricting to distances whichare compatible with the differential structure, we enter into the realm of Riemannian geometry.

2.2 Riemannian manifolds

2.2.1 Riemannian metric

A Riemannian metric is defined by a continuous collection of scalar products 〈 . | .〉p (or equivalentlynorms ‖.‖p) on each tangent space TpM at point p of the manifold. In a given chart, we can expressthe metric by a symmetric positive definite matrix G(x) = [gij(x)] where each element is given bythe dot product of the tangent vector to the coordinate curves: gij(x) = 〈 ∂i | ∂j 〉. This matrixis called the local representation of the Riemannian metric in the chart x and the dot products oftwo vectors v and w in TxM is now 〈 v | w 〉x = vT G(x) w. The matrix G(x) is called the localrepresentation of the Riemannian metric in the chart x.

2.2.2 Length of a curve and Riemannian distance

If we consider a curve γ(t) on the manifold, we can compute at each point its instantaneous speedvector γ(t) (this operation only involves the differential structure) and its norm, the instantaneousspeed. Notice that the Riemannian metric at the current point γ(t) of the curve is needed for thisoperation. To compute the length of the curve, this value is integrated as usual along the curve:

Lba(γ) =

∫ b

a‖γ(t)‖γ(t) dt =

∫ b

a

(〈 γ(t) | γ(t)〉γ(t)

) 12dt

Page 16: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

16CHAPTER 2. A SHORT INTRODUCTIONONDIFFERENTIAL AND RIEMANNIANGEOMETRY

The distance between two points of a connected Riemannian manifold is the minimum lengthamong the curves joining these points.

dist(x, q) = minγL(γ) with γ(0) = x and γ(1) = q (2.1)

The Riemannian metric is the intrinsic way of measuring length on a manifold. The extrinsicway is to consider the manifold as embedded in a larger vector space E (think for instance to thesphere S2 in R3) and compute the length of a curve in M as for any curve in E. In this case, thecorresponding Riemannian metric is the restriction of the dot product of E onto the tangent spaceat each point of the manifold. By Whitney’s theorem, there always exists such an embedding fora large enough vector space E (dim(E) ≤ 2dim(M) + 1).

2.2.3 Geodesics

Finding the curves realizing the minimum Riemannian distance is a difficult problem as any time-reparameterization is authorized. Thus one rather defines the metric Geodesics as the criticalpoints of the energy functional E(γ) = 1

2

∫ 10 ‖∂γ‖

2 dt. It turns out that they also optimize thelength functional but they are moreover parameterized proportionally to arc-length.

Let [gij ] = [gij ](-1) be the inverse of the metric matrix (in a given coordinate system x) and

Γijk = 12gim (∂kgmj + ∂jgmk − ∂mgjk) the Christoffel symbols (using Einstein summation convention

that implicit sum upon each index that appear up and down in the formula). The calculus ofvariations shows the geodesics are the curves satisfying the following second order differential system(in the chart x = (x1, . . . xn)):

γi + Γijkγj γk = 0

The fundamental theorem of Riemannian geometry states that on any Riemannian manifoldthere is a unique (torsion-free) connection which is compatible with the metric, called the Levi-Civita (or metric) connection. This connection is determined in a local coordinate system throughthe Christoffel symbols: ∇∂i∂j =

∑k Γkij .∂k. For that choice of connection, shortest path are

geodesics (”straight lines”). In the following, we only consider the Levi-Civita connection.

2.2.4 Completeness

The manifold is said to be geodesically complete if the definition domain of all geodesics can beextended to R. This means that the manifold has no boundary nor any singular point that we canreach in a finite time (for instance, Rn−0 with the usual metric is not geodesically complete, butRn or Sn are). As an important consequence, the Hopf-Rinow-De Rham theorem state that such amanifold is complete for the induced distance, and that there always exist at least one minimizinggeodesic between any two points of the manifold (i.e. whose length is the distance between the twopoints).

From now on, we will assume that the manifold is geodesically complete. This assumption isone of the fundamental property which is used to ensure that algorithms (e.g. Exp and Log maps)will return values within the manifold.

2.2.5 exponential map

Let p be a point of the manifold that we consider as a local reference and ~v a vector of the tangentspace TpM at that point. From the theory of second order differential equations, we know thatthere exists one and only one geodesic γ(p,~v)(t) starting from that point with this tangent vector.

Page 17: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

2.2. RIEMANNIAN MANIFOLDS 17

This geodesic is theoretically defined in a sufficiently small interval around zero but since we themanifold is geodesically complete, its definition domain can be extended to R. Thus, the pointγ(p,∂v)(t) is defined for all vector ∂v ∈ TpM and all parameter t.

This allows to wrap the tangent space onto the manifold, or equivalently to develop locally themanifold in the tangent space along the geodesics (think of rolling a sphere along its tangent planeat a given point): by mapping to each vector ~v ∈ TpM the point q of the manifold that is reachedafter a unit time by the geodesic γ(p,~v)(t) starting at p with tangent vector −→v .

This mapping

Expp :TpM −→ M∂v 7−→ Expp(∂v) = γ(p,∂v)(1)

is called the exponential map at point p. Straight lines going through 0 in the tangent space aretransformed into geodesics going through point p on the manifold and distances along these linesare conserved (Fig. 2.1).

Figure 2.1: Left: The tangent planes at points x and q of the sphere S2 are different: the vectors v and wof TxM cannot be compared to the vectors t and u of TqM. Thus, it is natural to define the scalar producton each tangent plane. Right: The geodesics starting at x are straight lines in the exponential map and thedistance along them is conserved.

The exponential map is defined in the whole tangent space TpM (since the manifold is geodesi-cally complete) but it is generally one-to-one only locally around 0 in the tangent space (i.e. aroundp in the manifold). In the sequel, we denote by −→pq = Logp(q) the inverse of the exponential map:this is the smallest vector (in norm) such that q = Expp(−→pq). In this chart, the geodesics goingthrough p are the represented by the lines going through the origin: Logpγ(p,−→pq)(t) = t−→pq. Moreover,the distance with respect to the development point p is preserved:

dist(p, q) = ‖−→pq‖ = (〈 −→pq | −→pq〉p)1/2

Thus, the exponential chart at p can be seen as the development of the manifold in the tangentspace at a given point along the geodesics. This is also called a normal coordinate system if itis provided with an orthonormal basis. At the origin of such a chart, the metric reduces to theidentity matrix and the Christoffel symbols vanish.

2.2.6 cut locus

Now, it is natural to search for the maximal domain where the exponential map is a diffeomorphism.If we follow a geodesic γ(p,∂v)(t) = Expp(t ∂v) from t = 0 to infinity, it is either always minimizingall along or it is minimizing up to a time t0 < ∞ and not any more after (thanks to the geodesic

Page 18: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

18CHAPTER 2. A SHORT INTRODUCTIONONDIFFERENTIAL AND RIEMANNIANGEOMETRY

completeness). In this last case, the point z = γ(p,∂v)(t0) is called a cut point and the correspondingtangent vector t0 ∂v a tangential cut point. The set of of all cut points of all geodesics startingfrom p is the cut locus C(p) ∈ M and the set of corresponding vectors the tangential cut locusC(p) ∈ TpM. Thus, we have C(p) = Expp(C(p)), and the maximal definition domain for theexponential chart is the domain D(p) containing 0 and delimited by the tangential cut locus.

It is easy to see that this domain is connected and star-shaped with respect to the origin of TpM.Its image by the exponential map covers all the manifold except the cut locus and the segment[0,−→pq] is transformed into the unique minimizing geodesic from p to q. Hence, the exponentialchart at p is a chart centered at p with a connected and star-shaped definition domain that coversall the manifold except the cut locus C(p):

D(p) ∈ Rn ←→ M− C(p)−→pq = logp(q) ←→ q = Expp(−→pq)

From a computational point of view, it is often interesting to extend this representation to includethe tangential cut locus. However, we have to take care of the multiple representations: pointsin the cut locus where several minimizing geodesics meet are represented by several points on thetangential cut locus as the geodesics are starting with different tangent vectors (e.g. rotation ofπ around the axis ±n for 3D rotations, antipodal point on the sphere). This multiplicity problemcannot be avoided as the set of such points is dense in the cut locus.

The size of this definition domain is quantified by the injectivity radius i(M, x) = dist(p, C(p)),which is the maximal radius of centered balls in TpM on which the exponential map is one-to-one.The injectivity radius of the manifold i(M) is the infimum of the injectivity over the manifold. Itmay be zero, in which case the manifold somehow tends towards a singularity (think e.g. to thesurface z = 1/

√x2 + y2 as a sub-manifold of R3).

Example: On the sphere Sn (center 0 and radius 1) with the canonical Riemannian metric(induced by the ambient Euclidean space Rn+1), the geodesics are the great circles and the cutlocus of a points p is its antipodal point p = −p. The exponential chart is obtained by rolling thesphere onto its tangent space so that the great circles going through p become lines. The maximaldefinition domain is thus the open ball D = Bn(π). On its boundary ∂D = C = Sn−1(π), all thepoints represent p.

Figure 2.2: Exponential chart and cut locus for the sphere S2 and the projective space P2

Page 19: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

2.3. ELEMENTS OF ANALYSIS IN RIEMANNIAN MANIFOLDS 19

For the real projective space Pn (obtained by identification of antipodal points of the sphereSn), the geodesics are still the great circles, but the cut locus of the point p,−p is now theequator of the two points where antipodal points are still identified (thus the cut locus is Pn−1).The definition domain of the exponential chart is the open ball D = Bn(π2 ), and the tangential cutlocus is the sphere ∂D = Sn−1(π2 ) where antipodal points are identified.

2.3 Elements of analysis in Riemannian manifolds

2.3.1 partial derivatives and gradient of a function

Let f be a smooth function fromM to R (an observable). Its differential df at point p is the linearform on TpM corresponding to the directional derivatives ∂v:

∀ v ∈ TpM df(v) = ∂vf

This differential is a co-vector (a linear form acting on vectors) and thus en element of TpM?

which cannot be directly seen as a gradient vector. Thanks to the Riemannian dot product, thereis fortunately a canonical way to identify the linear form df ∈ TpM? with the unique vectorw ∈ TpM that verifies df(v) = 〈 w | v 〉p for all vector v ∈ TpM. All these notions can obviouslybe extended to the whole manifold using vector fields.

In a local chart x, we have ∂vf = ∂f(x)∂x .v. Thus, the differential is the horizontal (co-) vector

df = [∂1f, . . . ∂nf ]). As 〈 w | v 〉x = wT G(x) v, the expression of the gradient in a chart is:

grad f = G(-1)(x)∂f

∂x

T

= gij ∂jf

This definition corresponds to the classical gradient in Rn even in the case of a non orthonormalbasis.

2.3.2 Hessian, covariant derivative and connection

The second covariant derivative (the Hessian) is a little bit more complex as it makes use of theconnection ∇:

Hessf(X,Y ) = ∇X∇Y f = ∇X(∂Y f)

Its expression in a local coordinate system accounts for the curvature:

Hess f = ∇df = (∂ijf − Γkij ∂kf) dxi dxj

Let now fp be the expression of f in a normal coordinate system at p. Its Taylor expansionaround the origin is:

fp(v) = fp(0) + Jfp v +1

2vT Hfp v +O(‖v‖3)

where Jfp = [∂if ] and Hfp = [∂ijf ]. Since we are in a normal coordinate system, we have fp(v) =f(Expp(v)). Moreover, the metric at the origin reduces to the identity: Jfp = grad fT, and theChristoffel symbols vanish so that the matrix of second derivatives Hfp corresponds to the HessianHess f . Thus, The Taylor expansion can be written in any coordinate system:

f(Expp(v)

)= f(p) + grad f(v) +

1

2Hess f(v, v) +O(‖v‖3) (2.2)

Page 20: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

20CHAPTER 2. A SHORT INTRODUCTIONONDIFFERENTIAL AND RIEMANNIANGEOMETRY

2.3.3 Riemannian measure or volume form

In a vector space with basis A = (a1, . . . an), the local representation of the metric is given byG = AT A where A = [a1, . . . an] is the matrix of coordinates change from A to an orthonormalbasis. Similarly, the measure (or the infinitesimal volume element) is given by the volume of theparallelepiped on spanned by the basis vectors: dV = | det(A)| dx =

√|det(G)| dx. Assuming now

a Riemannian manifold M, we can see that the Riemannian metric G(x) induces an infinitesimalvolume element on each tangent space, and thus a measure on the manifold:

dM(p) =√|G(x)| dx

One can show that the cut locus has a null measure. This means that we can integrate in-differently in M or in any exponential chart. If f is an integrable function of the manifold andfp(−→pq) = f(Expp(−→pq)) is its image in the exponential chart at p, we have:∫

Mf(p) dM =

∫D(p)

fp(~z)√G~x(~z) d~z

2.4 Practical implementation of algorithms on manifolds

The exponential and logarithmic maps (from now on Exp and Log maps) are obviously differentfor each manifold and for each metric. Thus they have to be determined and implemented on acase by case basis. Examples of closed form expressions for rotations and rigid body transforma-tions can be found for the left invariant metric in [PT97], and for covariance matrices (positivedefinite symmetric matrices, so called tensors in medical image analysis) in [PFA06, AFPA07] andSection ??. It has to be noticed that the equation of the geodesics are only needed for the sakeof computational efficiency: geodesics are curves minimizing the distance but also the Riemannianenergy (the integral of the squared speed) between two points. Thus computing −→pq = logp(q) maybe posed as an optimal control problem [KN97, ATY05], and computing Expp(v) as a numericalintegration problem (see e.g. [HM94, HLW02]). This opens the way to statistical computing inmore complex spaces than the one we considered up to now, like curves [MM06, KSMJ04, You98],surfaces, and diffeomorphic transformations. For instance, the large deformation diffeomorphicmetric mapping (LDDMM) method proposed for inter-subject image registration in computationalanatomy [BMTY05, MTY03, MY01, JM00] finds the geodesic in the joint intensity and deformationspace by minimizing the Riemannian length of the deformation for a given right-invariant metricon a diffeomorphism group. Through the so called EPDiff equation (Euler-Poincarre equation fordiffeomorphisms), this optimization framework has been recently rephrased in an exponential/log-arithm framework similar to the one developed here [MTY06]. Despite the infinite number ofdimensions, simple statistics like the mean and the principal component analysis of a (finite) set ofsamples may still be computed [VMYT04, DPT+08]. Exponential charts constitute very powerfulatomic functions in terms of implementation on which we will be able to express practically allthe geometric operations: the implementation of Logp and Expq is the basis of programming onRiemannian manifolds, as we will see in the following.

In a Euclidean space, the exponential charts are nothing but one orthonormal coordinates systemtranslated at each point: we have in this case −→pq = Logp(q) = q − p and Expp(~v) = p + ~v. Thisexample is more than a simple coincidence. In fact, most of the usual operations using additionsand subtractions may be reinterpreted in a Riemannian framework using the notion of bipoint, anantecedent of vector introduced during the 19th Century. Indeed, one defines vectors as equivalent

Page 21: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

2.5. LIE GROUPS AND HOMOGENEOUS MANIFOLDS 21

classes of bipoints (oriented couples of points) in a Euclidean space. This is possible because wehave a canonical way (the translation) to compare what happens at two different points. In aRiemannian manifold, we can still compare things locally (by parallel transportation), but not anymore globally. This means that each “vector” has to remember at which point of the manifold itis attached, which comes back to a bipoint.

Conversely, the logarithmic map may be used to map almost any bi-point (p, q) into a vector−→pq = Logp(q) of TpM. This reinterpretation of addition and subtraction using logarithmic andexponential maps is very powerful to generalize algorithms working on vector spaces to algorithmson Riemannian manifolds, as illustrated in Table 2.1 and the in following sections.

Euclidean space Riemannian manifold

Subtraction −→pq = q − p −→pq = Logp(q)

Addition p = q + ~v q = Expp(~v)

Distance dist(p, q) = ‖q − p‖ dist(p, q) = ‖−→pq‖pMean value (implicit)

∑i(pi − p) = 0

∑i−→ppi = 0

Gradient descent pt+ε = pt − ε−−−−−→∇C(pt) pt+ε = Exppt

(−ε−−−−−→∇C(pt))

Geodesic interpolation p(t) = p0 + t−−→p0p1 p(t) = Expp0(t−−→p0p1)

Table 2.1: Re-interpretation of standard operations in a Riemannian manifold.

2.5 Lie groups and Homogeneous manifolds

Group Set endowed with a composition rule g f for any two elements, a neutral element Id forthis operation ( Id f = f Id = Id) and an inversion g(-1).

The group G is a Lie group if the set has a separated topological structure and if the compostionand inversion operations are continuous and differentiable. G is thus a differentiable manifoldcompatible with the group structure.

Transformation group on a Manifold M: set of continuous and differentiable one-to-one map-pings fromM toM which is stable by composition and inversion. For the Euclidean spaceM = R,familiar examples are translations, rigid body, similarity or affine transformations. The action isdenoted by g ? x = g(x).

2.5.1 One parameter subgroups and group exponential

The curve g(t) : R → G is a one parameter subgroup if it conserves the group structure (this iscalled a group homomorphism), i.e. if we have ∀s, t ∈ R g(s + t) = g(s) g(t). Obviously, Ris commutative so that the elements of a one-parameter subgroup do commute. For any tangentvector v ∈ T IdG at the identity, one can show that there exists a unique one-parameter subgroupgv(t) = Exp(tv) having v as its tangent vector at the identity. Here Exp : T IdG → G is the(group) exponential mapping. In particular, any one-parameter subgroup of the general lineargroup G = GLn has the form

Exp(tX) =+∞∑n=0

1

n!tnXn

Page 22: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

22CHAPTER 2. A SHORT INTRODUCTIONONDIFFERENTIAL AND RIEMANNIANGEOMETRY

For a real Lie group endowed with a left and right invariant pseudo-Riemannian metric or affineconnection, the one-parameter subgroups are the geodesics passing through the identity .

2.5.2 Left and right invariant metrics on Lie groups

We have two canonical way to compare the tangent spaces at different points of the manifold: theleft translation Lg(f) = gf and the right translation Rg(f) = fg. These applications can obviouslybe differentiated to provide linear operators from TfG to T(gf)G and T(fg)G. Let us denote by JLand JR the expression of the differentials (in a given chart) at the identity:

JL(f) =∂(f e)

∂e

∣∣∣∣e= Id

and JR(f) =∂(e f)

∂e

∣∣∣∣e= Id

Let v be a tangent vector at the identity. It corresponds to a tangent vector of TfM vl = JL(f).vby left translation and vr = JR(f).v by right translation.

Let now 〈 . | .〉 be any dot product on the tangent space at the identity, characterized by asymmetric matrix Q in our chart. We can translate this dot product in any tangent space by leftor right translation. If v and W are tangent vectors at f, we define the left-invariant metric by:

〈 v | w 〉f =⟨JL(f)(-1).v

∣∣ JL(f)(-1).w⟩

= vT.JL(f)(-T).Q.JL(f)(-1).w,

which means that the expression of the metric is in our chart: GL(f) = JL(f)(-T).Q.JL(f)(-1). Theright-invariant metric is defined similarly with GR(f) = JR(f)(-T).Q.JR(f)(-1).

We note that the volume element associated to the these invariant metrics are the Haar measuresof the group (|J | = |det(J)|) [PA98]:

dLG(f) =√|GL(f)|.df = λ.

df

|JL(f)|and dRG(f) =

√|GR(f)|.df = λ.

df

|JR(f)|

Since the metric is left (resp. right) invariant, the geodesics are globally conserved by left (resp.right) translation and one can determine only the geodesics starting from the identity. One shouldbe careful that left and right invariant metrics are generally different. However, if the group iscompact (for instance rotations), there exists a left and right invariant metric [Spi79, dC92] andthe geodesics for this metric (starting from the identity) are the one parameter sub-groups, i.e. thecurves verifying ∀(s, t) ∈ R2, γ(s + t) = γ(s) γ(t) = γ(t) γ(s). This equation is often easierto solve that the standard second order differential equations system, but it is not valid for onlylocally compact groups (such as rigid transformations for instance).

Principal chart

From now on, we consider that the group is geodesically complete and that geodesics are determinedfor the left-invariant metric. The same developments could of course be done (with different results)with the right invariant one.

We call principal chart the exponential chart at the identity and we denote by ~f the representa-tion of f in this chart. To simplify the notations, we will write simply Exp and log the exponentialmap and its inverse function at the identity. As the metric matrix Q is symmetric and positive, it

can be written Q = AT.A. Thus, changing the coordinate system into−→f ′ = A.

−→f allows to obtain

a principal chart where Q = Id. We will consider Q = Id in the sequel.Let γ

( Id,~f)be a geodesic starting at the identity with tangent vector ~f. We have γ(0) = Id and

γ(1) = f. Translating this geodesic by g, we obtain a new geodesic δ = g γ( Id,~f)

starting at g with

Page 23: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

2.5. LIE GROUPS AND HOMOGENEOUS MANIFOLDS 23

tangent vector JL(~g).−→f and arriving at time one at g f. Thus, by definition of the exponential

map, we get Expg(JL(~g).−→f ) = g f. Taking v = JL(~g).~f, we obtain the following formula for the

exponential map at any point:

Expg(v) = g Exp(JL(~g)(-1).v) (2.3)

Taking now v = JL(~g).(~g(-1) ~f), we obtain the expression−→gf of the f in the exponential chart at g

(expressed in the basis induced by the principal chart):

−→gf = log~g(f) = JL(~g).(~g(-1) ~f) (2.4)

The main interest of all these developments is that, from a computer science point of view, we canuse only one chart (the principal chart) and obtain all the others by the left translation.

The distance between two transformations f and g is the length of the minimising geodesic

joining the two points. By definition, such a geodesics starts at g with tangent vector−→gf :

dist(f, g)2 = ‖−→gf‖2~g =

−→gfT.Q(~g).

−→gf = (~g(-1) ~f)T.(~g(-1) ~f) = ‖~g(-1) ~f‖2Id = dist(g(-1) f, Id)2 (2.5)

2.5.3 Homogeneous manifolds

A manifold on which acts a transformation group is said to be transitive or homogeneous withrespect to this group if, for any two points of the manifold, there exists a transformation of thegroup that transform one point to the other. Let o be a particular point of the manifold that we callthe origin and let us denote by g ? p the action of a transformation g on a point p of the manifold.The manifold is homogeneous if G ? o = g ? o / g ∈ G =M.

Invariant Riemannian metric

In the case of an homogeneous manifold, we can identify the elements p ∈M with subsets of of thegroup G. Let H be the isotropy subgroup of the origin, i.e. the set of transformations that leave itunchanged:

H = h ∈ G / h ? o = o

Let now fp be a transformation that bring the origin into the point p = fp ? o. The set of alltransformation that bring the origin into the point p is the left translation of H by fp:

Fp = g ∈ G / g ? o = p = fp H

These sets are in fact the elements of the quotient space G/H are are thus called the coset Fp of p.To compare the tangent spaces ToM and TpM at the origin and at point p, we now have a

whole set of transformations from Fp. Let us chose a placement function fp, i.e. a representativeof each coset. In a local coordinate system, the vector v ∈ TM is transformed by the placementfunction into

vx = J(fx).v with J(fx) =∂(fx ? e)

∂e

∣∣∣∣e=o

This transportation can by used to define the dot product from the tangent space at the origin oneach tangent space TpM:

〈 y1 | y2 〉x =⟨J(fx)(-1).y1

∣∣ J(fx)(-1).y2

⟩with J(fx) =

∂(fx ? e)

∂e

∣∣∣∣e=o

Page 24: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

24CHAPTER 2. A SHORT INTRODUCTIONONDIFFERENTIAL AND RIEMANNIANGEOMETRY

Thus, the expression of the metric in a local chart is

G(x) = J(fx)(-T).Q.J(fx)(-1) (2.6)

A necessary condition to obtain an invariant metric is obviously the stability of the metric withrespect to the choice of the placement function. From the definition of the cosets, this reduces tothe invariance of the metric Q at the origin by the action of the isotropy group H:

∀h ∈ H J(h)T.Q.J(h) = Q (2.7)

The dot product 〈 . | .〉x is in this case a continuous function of p: we have obtained an invariantmetric (the condition above is sufficient). Note that there does not always exists an invariantmetric: think for instance to points under the affine group.

Assuming that there exists an invariant metric, the associated infinitesimal volume element isgiven by:

dM(p) =√|G(x)|.dx = λ.

dx

|J(fx)|with λ =

√|Q| and fx ∈ Fx

This measure was the invariant measure determined in [PA98]. However, we have a strongerconstraint on the existence: an invariant distance implies an invariant measure, but the converse isfalse.

Principal chart

In the sequel, we assume that there exists and invariant distance and that the manifold is geodesi-cally complete for this metric. We call principal chart the exponential chart at the origin withQ = Id. The action of a transformation is define in this chart by f ?~x ≡ log(f ?p) = log(f ?Exp(~x)).

Since geodesics are globally invariant by the action of transformation, we can easily express theexponential chart at any other point. If −→pq is a tangent vector of TpM express in the basis inducedby the principal chart, we have:

Exp~x(−→pq) = f~x ? Exp(J(f~x)(-1).~y)

The inverse function gives (in the star-shaped domain delimited by the cut-locus):

−→pq = log~x(q) = J(f~x).(f(-1)~x ? ~y) (2.8)

The functions are independent of the chosen placement function. Indeed, let f ′p be anotherplacement function. Then, by definition of the cosets, for each point p there exists a transformationhp ∈ H such that f ′p = fp hp. Now, the geodesics starting from the origin are strait lines in theexponential chart and are globally invariant: they are transformed into strait lines in the exponentialchart by the isotropy group. This means that the action of the isotropy group is linear in theprincipal chart: we can write h?~x = J(h).~x. Thus, f ′(-1)~x ?~y = h(-1)

~x ? (fp ?~y) = J(h~x)(-1).(fp ?~y). SinceJ(f ′~x) = J(f~x).J(h~x), we get the sought equality J(f~x).(f(-1)~x ?~y) = J(f ′~x).(f ′(-1)~x ?~y) The invariance ofthe exponential is similar.

The distance between two points p and q is by definition the length of the minimizing geodesicjoining them, and by definition of the exponential chart the length of −→pq:

dist(p, q)2 = ‖ log~x(q)‖2~x = −→pqT.Q(~x).−→pq = (~f(-1)~x ? ~y)T.(~f(-1)~x ? ~y) (2.9)

Page 25: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

2.6. ADDITIONAL REFERENCES 25

2.6 Additional references

• Good introduction in Small, Statistical theory of shapes, Springer, 1996.

• M. Do Carmo, Riemannian Geometry, Birhauser, 1992.

• J. Gallier, Notes on Differential Geometry Manifolds, Lie Groups and Bundles, Chap. 3,http://www.cis.upenn.edu/ jean/gbooks/manif.html.

• Gallot, Hulin, Lafontaine, Riemannian Geometry, 2nd edition, Springer 1993.

Alternative reference books:

• Jrgen Jost, Riemannian geometry and geometric analysis, Springer-Verlag.

• Marcel Berger : A Panoramic View of Riemannian Geometry (Springer)

• Boothby ; An Introduction to Differentiable Manifolds and Riemannian Geometry

Free texts on internet

• http://www.geocities.com/alex stef/mylist.html#Geom

• http://igat.epfl.ch/troyanov/GeoRiem2006.html

2.6.1 References on Lie groups

• S. Helgason, Differential geometry, Lie groups, and symmetric spaces, Acad. Press, 1978.

• J. Gallier, Notes on Differential Geometry Manifolds, Lie Groups and Bundles, Chap. 2 and4, http://www.cis.upenn.edu/˜jean/gbooks/manif.html.

Page 26: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

26CHAPTER 2. A SHORT INTRODUCTIONONDIFFERENTIAL AND RIEMANNIANGEOMETRY

Page 27: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Chapter 3

3D Rotations matrices

The goal of this chapter is to exemplify on a simple and useful manifold part of the general meth-ods developed previously on Riemannian geometry and Lie groups. This document also providenumerically stable formulas to implement all the basic operations needed to work with rotations in3D.

3.1 Vector rotations of R3

3.1.1 Definition

Let i, j, k be a right handed orthonormal basis of the vector space R3 (we do not consider it hereas a Euclidean space), and B = e1, e2, e3 be a set of three vectors. The linear mapping fromi, j, k to B is given by the matrix

R = [e1, e2, e3] =

e11 e21 e31

e12 e22 e32

e13 e23 e33

(3.1)

If we now want B to be an orthonormal basis, we have that 〈 ei | ej 〉 = δij , which means that thematrix R verifies

R.RT = RT.R = I3 (3.2)

This equation implies in turn that det(R.RT) = 1 (i.e. det(R) = ±1) and gives rise to the subsetO3 of orthogonal matrices of M3×3. If we want B to be also right handed, we have to imposedet(R) = 1.

SO3 = R ∈M3×3/R.RT = Id and det(R) = +1 (3.3)

Such matrices are called (proper) rotations. Each right handed orthonormal basis can then berepresented by a unique rotation and conversely each rotation maps i, j, k onto a unique righthanded orthonormal basis.

As a subset of matrices, rotations can also be seen as a subset of linear maps of R3. Theycorrespond in this case to their usual interpretation as transformations of R3: they are positiveisometries (maps conserving orientation and dot product): 〈R.x |R.y 〉 = 〈 x | y 〉. In particular,they conserve the length of a vector: ||R.x|| = ||x||. Seen as a transformation group of R3 the matrixmultiplication as the composition law and Id as identity, rotations forms a non-commutative groupdenoted by SO3 (3D rotation group).

The three main operations on rotations are:

27

Page 28: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

28 CHAPTER 3. 3D ROTATIONS MATRICES

• the composition of R1 and R2: R = R2.R1 (beware of the order),

• the inverse of R: R(-1) = RT,

• and the application of R to a vector x: y = R.x.

The orthogonal group O3 is defined as a subspace of R3×3 by equation (3.2) which give riseto 6 independent scalar equations since R.RT is symmetric. Hence O3 is a differential manifoldof dimension 3. As the determinant is a continuous function in the vector space of matrices, theconstraint det(R) = ±1 shows that O3 as a manifold has two non connected leaves. Taking intoaccount the constraint det(R) = 1 amounts to keep the component of identity, which is calle thespecial orthogonal group SO3. Since the composition and inversion maps are infinitely differentiable,this is moreover a Lie group. Notice that the other leave of the manifold is not a group as it does notcontain the identity and it is not stable by composition (the composition of two improper rotationsis a proper rotation).

3.1.2 From matrices to rotations

From a computational point of view, a rotation matrix is encoded as a non-constrained matrix,and it is necessary to verify that the orthogonality constraints are satisfied before using it. If thisis not the case (or if the numerical error is too large), there is a need to recompute the most likelyrotation matrix. This can be done by optimizing for the closest rotation matrix in the sense of thedistance in the embedding space.

The Froebenius dot products on matrices is a natural metric on the vector space of matrices:

〈X | Y 〉Rn×n =1

2Tr(XT.Y ) =

1

2Tr(X.Y T) =

1

2

∑i,j

Xij .Yij

With this metric, the distance between a rotation matrix R and an unconstrained matrix K is thusdist(R,K)2 = 1

2Tr((R−K)T.(R−K)) = 12Tr(K.KT) + 3

2 −Tr(R.KT). Notice incidentally that theextrinsic distance between two rotations matrices is distFro(R1, R2)2 = 3− Tr(R1.R2).

To find the closest rotation to K, we have to maximize Tr(R.KT) subject to the constraintR.RT = Id. This is called the orthogonal Procrustes problem in statistics, and it relates very closelyto the absolute orientation problem in photogrammetry, the pose estimation problem in computer vi-sion and the rotational superposition problem in crystallography. Several closed form solutions havebeen developed, using unit quaternions [Hor87, Kea89, Aya91, Fau93, HM93], singular value decom-position (SVD) [Sch66, McL72, AHB87, Ume91], Polar decomposition [HHN88] or dual quaternions[WS91]. Additional bibliography includes [Mos39, Joh66, Kab76, Kab78, Hen79, Mac84, Dia88].We present here the SVD method because it is valid in any dimension.

Taking into account the constraints R.RT = Id and det(R) = +1, the Lagrangian is

Λ = Tr(R.KT)− 1

2Tr (L.(R.RT − Id))− g.(det(R)− 1)

where L is a symmetric matrix and g a scalar. According to appendix A.1, we have :

∂(Tr(R.KT))

∂R= K

∂(Tr(L.R.RT))

∂R= 2.L.R

∂ det(R)

∂R= det(R).R(-T) = R

Thus, the optimum is characterized by

∂Λ

∂R= K −R.L− g.R = 0 ⇐⇒ R(L+ g Id) = K (3.4)

Page 29: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

3.1. VECTOR ROTATIONS OF R3 29

Let L′ = L+ g. Id and K = U.D.V T be a singular value decomposition of K. Remember that Uand V are orthogonal matrices (but possibly improper) and D is the diagonal matrix of (positives)singular values. As L (thus L′) is symmetric, we have:

L′2

= (L′.RT)(R.L′) = KT.K = V.D2.V T

The symmetric matrices L′2 and L′ trivially commute. Thus, they can be diagonalized in a commonbasis, which implies that L′ = V.D.S.V T, where S = DIAG(s1, . . . , sn) and si = ±1. As S = S(-1)

and SD = DS, we can simplify (3.4) into:

R.V.D = U.S.D ⇐⇒ R = U.S.V T

The positive determinant constraints gives moreover det(S) = det(U).det(V ). Inserting R =U.S.V T in the criterions, we find that

Tr(R.KT) = Tr(U.S.D.UT) = Tr(D.S) =

n∑i=1

di.si

Assuming that singular values are sorted in decreasing order, the maximum is obtained for s1 =. . . = sn−1 = +1 and sn = det(U). det(V ).

We should note that the rotation R = U.S.V T is always a optimum of our criterion, but is maynot be the only one. This minimum is obviously unique if all the singular values are non zero (Khas then maximal rank), and [Ume91] shows that it ios still unique if rank(K) ≥ n− 1.

Theorem 3.1 (Least-squares estimation of a rotation from a matrix K)Let K = U.D.V T be a singular value decomposition of a given square matrix K with (positives) sin-gular values sorted in decreasing order in D and S = DIAG(1, . . . 1,det(U) det(V )). The optimallyclosest rotation in the Least Squares (Froebenius) sense is given by

R = U.S.V T (3.5)

This optimum is unique if rank(K) ≥ n− 1.

Exercise 3.1 Orthonormalization of matrices

• Implement the SVD orthogonalization procedure Mat2RotMat(M) to compute the proper rotationwhich is closest to a given unconstrained matrix M .

3.1.3 Geometric parameters: axis and angle

Let R be a 3-D rotation matrix. It is well known that it is characterized by its axis n (unit vector)and its angle θ. To understand why, let us compute the eigenvalues of the matrix R: they aresolutions of the system det(R − λ. Id) = 0. This is a real polynomial of order 3. Thus, it hasone real zero λ1 and two complex conjugate zeros λ2,3 = µ.e±iθ. Let n be the unit eigenvectorcorresponding to the real eigenvalue: R.n = λ1.n. As a rotation conserves the norm, we have‖R.n‖ = |λ1| = ‖n‖ = 1, which means that λ1 = ±1. However, the determinant of R being one,this imposes det(R) = λ1.µ

2 = 1, from which we conclude that λ1 = +1 and |µ| = 1. Thus, every3D rotation has an invariant axis n which is the eigenvector associated to the eigenvalue 1, and twounitary complex conjugate eigenvalues λ2,3 = e±iθ which realize a 2D rotation of angle θ in the 2Dplane orthogonal to that axis.

Page 30: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

30 CHAPTER 3. 3D ROTATIONS MATRICES

Rodrigues’ formula: computing the rotation matrix R(θ, n)

To rotate a vector x, we have to conserve the part x‖ = n.nT.x which is parallel to n, and to rotatethe orthogonal part x⊥ = (Id−n.nT).x by an angle θ around n. Let us first notice that the rotationof angle π/2 of x⊥ around the axis n is simply given by w = n× x⊥ = n× x. Thus the rotation ofangle θ is simply y⊥ = cos(θ).x⊥ + sin(θ).w, and the final rotated vector can be written:

y = R(n, θ).x = x‖ + y⊥ = n.nT.x+ cos(θ).( Id− n.nT).x+ n× x

Let Sx =

0 −x3 x2

x3 0 −x1

−x2 x1 0

be the linear operator (skew matrix) corresponding to (left) cross

product: for all vector y we have Sx.y = x × y. One easily verifies the identity S2n = n.nT − Id.

Using this matrix, we can rewrite the rotation formula in a matrix form which is valid for all vectorsx. This expression is called Rodrigues formula:

R = Id+ sin θ.Sn + (1− cos θ).S2n = cos θ. Id+ sin θ.Sn + (1− cos θ).n.nT (3.6)

Computing the angle and axis parameters from the rotation matrix

Conversely, we can determine the angle and axis of a rotation R using:

θ = arccos

(Tr(R)− 1

2

)and Sn =

R−RT

2. sin θ(3.7)

The equation for the axis is valid only when θ ∈]0;π[. Indeed, for θ = 0 (i.e. identity) the rotationaxis n is not determined, and sin(θ) = 0 for reflections (i.e. when θ = π). From a computationalpoint of view, this is creating numerical instabilities around the values θ = 0 and θ = π that needto be adressed.

R close to identity: θ is small Since the axis n is not defined for identity, there is a singularityand a numerical instability around it. However, we can compute the rotation vector with a Taylorexpansion:

Sr = θ Sn =θ

2 sin θ.(R−RT) =

1

2.

(1 +

θ2

6

).(R−RT) +O

(θ4)

R close to a reflection: π − θ is small The axis is this time well defined, but we have to useanother equation. From Rodrigues formula, we get R + RT − 2. Id = 2.(1 − cos θ).S2

n, and sinceS2n = n.nT − Id, we have

n.nT = Id+1

2.(1− cos θ). (R+RT − 2. Id)

Let % = 1/(1− cos θ); taking diagonal terms gives

n2i = 1 + %.(Ri,i − 1) ⇒ ni = εi

√1 + %.(Ri,i − 1)

The off diagonal terms are used to determine the signs εi: considering that the sign of n1 isε ∈ −1; +1, we can compute that

sign(nk) = ε.sign(R1,k +Rk,1)

Page 31: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

3.1. VECTOR ROTATIONS OF R3 31

If we have an exact reflection, the sign ε does not matter since rotating clockwise or counter-clockwise gives the same result, but for a quasi-reflection, this sign is important. In this case, thevector w = 2. sin θ.n is very small but not identically null: it can be computed without numericalinstabilities with Sw = R − RT. Since θ < π, the largest component wk in absolute value of thisvector must have the same sign as the corresponding component nk of vector n.

The rotation vector r = θ.n

As a conclusion, the rotation angle and axis are not always numerically stable (in particular aroundθ = 0 and θ = π), but we can always compute a stable compound version: the rotation vectorr = θ.n.

From an algorithmic point of view, the rotation vector is thus a representation of choice for 3Drotations. This representation of rotations is studied since a long time (see for instance [Stu64]), butit takes a particular importance in robotics [Pau82, Lat91] and computer vision [Kan93]. Althoughthe geometric angle and axis parameters were well known, their combination into the rotationvector was not practically used before [Aya89, chap. 12].

As this angle is defined up to 2π, all the vectors rk = (θ + 2.k.π).n (k ∈ Z) represent thesame rotation R = R(θ, n). Thus, a first chart is rotations vectors r = θ.n from the open ballD = B(0, π). To define an atlas, we should define at least three other charts. Following [Aya89],one could keep the same representation r = θ.n with different definition domains, for instance thehalf open ball B1 = r ∈ B(0, 2π)/r1 > 0 (respectively B2 and B3) covering the rotations havinga non null component of the axis of rotation along e1 (respectively e2 and e3).

Rodrigues’ formula allow us to compute directly the rotation matrix from the rotation vector:

R = Id+sin θ

θ.Sr +

(1− cos θ)

θ2.S2r with θ = ‖r‖

To get avoid numerical instabilities around θ = 0, one can use the following Taylor expansions:

sin θ

θ= 1− θ2

6+O

(θ4)

et(1− cos θ)

θ2=

1

2− θ2

24+O

(θ4)

Exercise 3.2 Rotation matrix to rotation vector

• Implement the operator Sn (SkewSymMat(r)).

• Verify that it corresponds to the left cross product.

• Implement a function RotVect2RotMat(r) converting a rotation vector to a rotation matrices and itsinverse RotMat2RotVect(r).

3.1.4 Uniform random rotation

For testing our computational framework on rotations, we will need to generate some randomrotations. Although any distribution covering the whole space would be sufficient for that, it ismore satisfying to have a distribution that covers uniformly the manifold. In the context of agroups, uniformity is defined in terms of invariance by left or right translation: the measure µ isleft invariant if µ(R.S) = µ(R) for any rotation R and set S ⊂ SOn.

For any locally compact topological groups, the left (resp. right) Haar’s measure µl (resp µr)is the unique (up to a multiplicative constant) left-invariant Borel measure on the group. Leftand right Haar’s measure are generally different (we have µr(S) = µl(S

(-1))) , unless the group is

Page 32: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

32 CHAPTER 3. 3D ROTATIONS MATRICES

unimodular which is the case of rotations. Moreover, as our Riemannian metric is left and rightinvariant, the induced Riemannian measure is exactly the Haar measure.

Algorithms to generate random uniform orthogonal matrices from square matrices with Gaus-sian entries have been developed based on the QR recomposition (the orthogonal matrix Q isuniform as long as the diagonal of R contains only positive entries) [Ste80] or more efficiently onthe subgroup algorithm [DS87], which iteratively builds an (n+1)× (n+1) orthogonal matrix froman nn one and a uniformly distributed unit vector of dimension n + 1 by applying a Householderreflection from the vector to the smaller matrix (embedded in M(n+1)×(n+1) with 1 in the bottomcorner).

In our case, as the efficiency is not our main concern, it is more consistent to continuewith the SVD. Let us assume a rotationally symmetric (spherical) distribution on the embed-ding space of square matrices (considered as a vector space with the standard Froebenius norm)[FH84]. Rotational invariance means that the pdf of the matrix K does only depend on its norm:p(K) = ϕ(Tr(K.KT). Let K = U.S.V T be a SVD decomposition; this can be further simplified intop(K) = ϕ(

∑i si). Thus, the orthogonalization procedure K = U.S.V T → R = U.V T amounts to

marginalize with respect to the singular values, and we have that p(R) =∫ϕ(∑

i si).∏i dsi = Cte

with respect to the restriction of the embedding measure to the manifold. A zero mean and unitvraiance Gaussien distribution on each matrix being a rotationally symmetric distribution, its or-thogonalization using the SVD method is thus a uniform distribution on SOn.

Exercise 3.3 Random uniform rotations

• Implement a uniform random rotation matrix generator UnifRndRotMat().

• Verify that the distribution of the rotation angle is proportional to sin(θ/2)2.

• Verify the orthogonality of the SVD orthogonalization and the consistency of the transformationbetween rotation matrices and rotation vectors on large number of random rotations.

3.1.5 Differential properties

Tangent spaces

Let R(t) = R + t.R + O(t2) be a curve drawn on SO3, considered as an embedded manifold intothe set of matrices M3×3. The constraint (3.2) is differentiated into

R.RT +(R.RT

)T

= 0 or RT.R+(RT.R

)T

= 0

which means that R.RT and RT.R are skew-symmetric matrices (these two conditions are equiv-alent). Since 3 × 3 skew-symmetric matrices have 3 free components, we have determined all thetangent vectors of the 3-dimensional tangent spaces:

Theorem 3.2 the tangent space T IdSO3 of SO3 at identity is the vector space of skew matrices.The tangent space TRSO3 at R ∈ SO3 is given by

TRSO3 = X ∈M3×3/X.R = −(X.R)T = X ∈M3×3/R.X = −(R.X)T

Page 33: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

3.1. VECTOR ROTATIONS OF R3 33

Left and right translations

Two important and canonical maps on a Lie group are very useful for studying the tangent space:these are the left and right translations. In SO3 this is nothing else than the left and rightcomposition by a fixed rotation R0

SO3 −→ SO3

LR0 : R 7−→ LR0(R) = R0.RRR0 : R 7−→ RR0(R) = R.R0

(3.8)

Their differentials realize canonical isomorphisms between the tangent spaces of SO3 at differentpoints. Let X be a vector of TRSO3 and RX(t) = R + t.X + O(t2) be a curve on SO3 going

through R with tangent vector dRX(t)dt = X. The left translation of RX is the curve R0.RX(t) =

R0.R + t.R0.X + O(t2) and its tangent vector at 0 is Y = R0.X. The differential DLR0 of LR0 isthen for any R in SO3

TRSO3 −→ TR0.RSO3

DLR0 : X 7−→ DLR0(X) = R0.X

Notice that LR and its differential DLR take the same form in the embedding space of matricesalthough they act on different subspaces. The differential DRR0(X) = X.R0 is defined similarly.

For R = Id, this gives two canonical isomorphisms between the tangent space at identity TSO3

and the tangent space TRSO3 at any point R (since the formulations of DLR0 and DRR0 andindependent of R, we denote their restriction to R = Id by the same names).

T IdSO3 −→ TRSO3

DLR : X 7−→ R.XDRR : X 7−→ X.R

(3.9)

Differential properties of rotations are then reducible to differential properties around identity inT IdSO3.

Bases of tangent spaces

In order to be able to compute in tangent spaces, we need to have coordinates and thus to definebases. Any skew-symmetric X matrix verifies Xij = −Xji so that only off diagonal componentsare non-null. Moreover, the lower triangular part is completely determined by the upper triangleone. Thus, a basis of TSO3 is given by

E1 =

0 0 00 0 −10 1 0

E2 =

0 0 10 0 0−1 0 0

E3 =

0 −1 01 0 00 0 0

The projection in this basis determines an isomorphism between the tangent space at identity

and the vector space R3:

x =

x1

x2

x3

∈ R3 7→ Sx =

0 −x3 x2

x3 0 −x1

−x2 x1 0

∈ T IdSO3 (3.10)

In order to minimize the number of notations, the basis has been chosen so that the matrix Sn isthe skew matrix corresponding to (left) cross product, as defined before.

Page 34: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

34 CHAPTER 3. 3D ROTATIONS MATRICES

To obtain bases of all other tengent spaces, we could find ad-hoc bases from the embeddingspace. However, we have two canonical ways to transport a structure from the tangent plane atidentity to any point: the left and right translations of Ei are Eli = R.Ei and Eri = Er.R. Thecoordinates in these two bases give rise to two different canonical isomorphisms between the tangentspace at any point and R3:

Given dR ∈ TRSO3, ∃ ωr, ωl ∈ R3 such as dR = Sωr .R = R.Sωl(3.11)

Exercise 3.4 Bases and coordinates in tangent spaces

• Implement functions Ei(i), Ei l(R,i) and Ei r(R,i) that gives the above basis vectors of T IdSO3, andtheir left and right translation which constitute bases of TRSO3 at any rotation R.

• Implement a projection projection TgCoordRotId(dR) : dR ∈ T IdSO3 7→ dr ∈ R3 which computethe coordinates in the above basis. Remark: compute the projection error because dR might notnumerically be exactly skew symmetric.

• Extend that projection by left and right translations to obtain coordinates in the left and right trans-lated bases: TgCoordRot r(R, dR) and TgCoordRot r(R, dR).

• Implement the reverse functions InvTgCoordRotId(dr) : dr ∈ R3 7→ dR ∈ T IdSO3 , InvTgCoor-dRot r(R, dr) and InvTgCoordRot r(R, dr).

• Verify that dR = InvTgCoordRot(TgCoordRot(R, dR)) for random tangent vectors at random rotationmatrices.

The Lie algebra of SO3

We already know that T IdSO3 is the vector space of skew symmetric matrices, identifiable withR3. If we now consider that (R3,+,×) is an algebra (× is the cross product), we can induce analgebra on T IdSO3. Let X = Sx and Y = Sy be two vectors of TSO3, then the vector Z = S(x×y)

belongs to TSO3 and is called the bracket of X and Y :

Z = S(x×y) = Sx.Sy − Sy.Sx = X.Y − Y.X = [X,Y ]

Hence (T IdSO3,+, [., .]) is an algebra with the standard matrix commutator [X,Y ] = X.Y − Y.X.This is not by chance, it is in fact the Lie algebra of the group SO3.

3.1.6 A bi-invariant metric on SO3

The next step is to give a metric to the group. Consider the Froebenius dot products on matrices〈X | Y 〉Rn×n = 1

2Tr(XT.Y ). The factor 1/2 is there to compensate the fact that we are countingtwice each off diagonal coefficient of the skew-symmetric matrices. This induces on any tangentspace TRSO3 at R the scalar product

〈X | Y 〉TRSO3

def=

1

2.Tr(X.Y T)

This metric is obviously invariant by left and right translation as for any rotation U since we have

〈 U.X | U.Y 〉TU.RSO3=

1

2.Tr(U.X.(U.Y )T) =

1

2.Tr(X.Y T) = 〈X | Y 〉TRSO3

.

Page 35: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

3.1. VECTOR ROTATIONS OF R3 35

and equivalently for the right translation:

〈X.U | Y.U 〉TR.USO3=

1

2.Tr(X.U.(Y.U)T) =

1

2.Tr(X.Y T) = 〈X | Y 〉TRSO3

.

Such a metric is called a bi-invariant Riemannian metric.

Notice that the existence of such a metric is not ensured for more general non-commutative andnon-compact groups. Moreover, changing the choice of the scalar product of the embedding space(other than by a global scaling) will induce a new metric on all tangent spaces which will not beleft nor right invariant.

In order to understand what this scalar product is, let us investigate its expression is a localcoordinate system. At the identity, the coordinate vector x of X is such that X =

∑i xi.Ei. By

linearity, the scalar product of two vector X and Y is reduced to the one of two basis vectors :〈X | Y 〉T IdSO3

=∑

i,j xi.yj . 〈 Ei | Ej 〉T IdSO3. With the above Froebenius metric and the basis we

chose, the expression of the metric tensor is particularly simple:

〈 Ei | Ej 〉T IdSO3=

1

2.Tr(Ei.E

Tj ) = δij

Thus, our basis is in fact ortho-normal and the Froebenius scalar product corresponds to thecanonical dot product of R3 through our isomorphism:

〈 Sx | Sy 〉T IdSO3= 〈 x | y 〉R3 = xT.y

In the tangent at rotation R, we can use the coordinates xl of X in the basis Eli = R.Ei or thecoordinates xr in the basis Eri = Ei.R. We can easily see that these two basis are also orthonormal:

〈X | Y 〉TRSO3=⟨R.Sxl

∣∣R.Syl ⟩TRSO3=

1

2.Tr(ST

xl .RT.R.Syl) =

1

2.Tr(ST

xl .Syl) = xlT.yl = xrT.yr

Theorem 3.3 The tangent space of SO3 at identity T IdSO3 is the vector space of skew symmetricmatrices. With the Lie bracket [., .] and the Euclidean metric 〈X | Y 〉 = 1

2 .Tr(XT.Y ), it forms ametric algebra which is canonically isomorphic to (R3,+,×, 〈 . | .〉R3).

The tangent space TRSO3 at R is transported from T IdSO3 with the left or right translation byequation (3.9).

Exercise 3.5 Bi-invariant scalar product in tangent spaces

• Implement the scalar product ScalRotId(X, Y) = Tr(X.Y T)/2.

• Verify that ScalRotId(Ei(i),Ei(j)) = δij

• Verify that dR = 〈 dR | E1 〉.E1 + 〈 dR | E2 〉.E2 + 〈 dR | E3 〉.E3 for random skew symmetric matrices.

• Implement the scalar product ScalRot(R, X, Y) at any rotation R using left translation.

• Generate Gaussian random vectors in the tangent plane at a random rotation R and verify numericallythat their scalr product corresponds to the scalar product of the embedding space.

Page 36: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

36 CHAPTER 3. 3D ROTATIONS MATRICES

3.1.7 Group exponential and one-parameter subgroups

The matrix exponential is defined for any matrix X as the limit of the series

exp(X) = Id+X/1! +X2/2! + . . . =+∞∑k=0

Xk

k!

For any element X of the Lie algebra of SO3, the skew symmetry implies that X3 = −θ2.X (withθ = ‖X‖) so that the series reduces to

exp(X) = Id+sin θ

θ.X +

1− cos θ

θ2.X2 = Id+ sin θ.Sn + (1− cos θ).S2

n

We recognize here Rodrigues’ formula:t the matrix exponential of the skew symmetric matrixassociated to the rotation vector r = θ.n is the rotation of angle θ around the unit axis n:exp(θ.Sn) = R(n, θ).

One parameter subgroups

Let RX(s) be a one parameter subgroup of SO3 (homomorphism from (R,+) to (SO3, .). This is acontinuous curve which is also a subgroup of SO3. By definition and since (R,+) is commutative

RX(s+ t) = RX(s).RX(t) = RX(t).RX(s)

This means in particular that RX(t) and RX(s) commute. Thus, they have the same rotationaxis n. RX(s) is thus a rotation of axis n and angle θ(s). Reporting this in the definition ofone parameter subgroups, we find θ(s + t) = θ(s) + θ(t) and hence θ(s) = λ.s with some λ ∈ R.Computing the derivatives, we find

dRXds

∣∣∣∣0

= X = λ.Sn ∈ TSO3 anddRXds

∣∣∣∣s

= X.RX(s) = RX(s).X ∈ TRXSO3

We established that to each one-parameter subgroup RX(s) = R(n, λ.s) = exp(λ.s.Sn) of SO3

corresponds a unique vector X = λ.Sn of TSO3. The converse is also true and RX(s) is calledthe integral curve of X. It is to be noted that X and λ.X generate the same integral curve withproportional parameterizations.

This is a particular case of a more general theorem for Lie groups which state that there is aone to one correspondence between one parameter subgroups of the Lie group and one dimensionalsub-algebras of its Lie algebra.

Group exponential map

Let X = θ.Sn with ‖n‖ = 1 be a vector of T IdSO3 and RX(s) = R(n, θ.s) = exp(λ.s.Sn) theintegral curve of X. The group exponential map is defined as the map from to Lie algebra (thetangent plane at identity) to the group SO3 which assigns RX(1) to X. Hence

X ∈ T IdSO3 7→ exp(X) = exp(θ.Sn) = R(n, θ) ∈ SO3

The term exponential maps comes from fact that one-parameter subgroups of matrix groups canbe computed using the matrix exponential. Using the isomorphism between R3 and T IdSO3 wecan also define the exponential of the rotation vector r = θ.n:

R(r) = exp(Sr) = R(n, θ) with θ = ‖r‖ and n =r

θ

Page 37: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

3.1. VECTOR ROTATIONS OF R3 37

The exponential map is a sort of “development” of SO3 onto its tangent space TSO3: eachone-dimensional subspace R.Sn is mapped on its integral curve R(n,R) and we will see that thelength along these curves are conserved.

3.1.8 Metric properties

Definitions

Let R : s ∈ [a, b] ⊂ R 7→ R(s) ∈ SO3 be a piecewise curve on SO3 and R the tangent vector of Rat s. The length of the curve R is defined by

L(R) =

∫ b

a

√⟨R(s)

∣∣∣ R(s)⟩R(s)

.ds (3.12)

Let now Γ be the set of curves joining rotations R1 and R2. The map

SO3 × SO3 −→ Rρ : (R1, R2) 7−→ infR∈Γ L(R)

(3.13)

is the canonical metric on SO3. The curves R minimizing the criterion L(R) are called geodesics.

Geodesics and metric on SO3

Since the metric is bi-invariant on SO3, the length criterion (3.12) is invariant by left and righttranslations and finding geodesics amounts to find geodesics starting from identity. Moreover, fora Lie group with a bi-invariant Riemannian metric, it turns out that geodesics starting from iden-tity, one-parameter subgroups and integral curves (starting also from identity) are three differentapproaches for the same curves [Spi79, chap.10]. Let RX(s) = exp(λ.s.Sn) be such a curve. Itsderivative at identity is RX(0) = X = λ.Sn and RX(s) = X.RX(s) = RX(s).X elsewhere. Hence⟨

R(s)∣∣∣ R(s)

⟩= 〈X |X 〉 = λ2. 〈 n | n〉 = λ2

and the distance from identity to R(n, θ) = RX(θ/λ) is

ρ(I3,R(n, θ)) =

∫ θ/λ

0

√λ2.ds = θ

With an arc-length parameterization, we obtain RSn(θ) = R(n, θ) = exp(θ.Sn). These curves are2π-periodic and RSn(θ) = R(−Sn)(−θ). Hence shortest paths (or minimizing geodesics) are uniquelydefined for θ < π and doubly defined for θ = π.

Theorem 3.4 The canonical metric on SO3 is given by

ρ(R1, R2) = ρ(I3, RT1 .R2) = θ(RT

1 .R2) = arccos

((Tr(RT

1 .R2)− 1)

2

)Geodesics of SO3 starting from identity are the curves θ 7→ R(n, θ) = exp(θ.Sn). Shortest pathsfrom identity to the non reflection rotation R = exp(θ.Sn) (0 ≤ θ < π) are given by

t ∈ [0, θ] 7→ exp(t.Sn)

Shortest paths from identity to reflection R = exp(πSn) are doubly defined by the above formula(θ = π) with n and −n as unit vectors. Other geodesics are obtained by left or right translation.

Page 38: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

38 CHAPTER 3. 3D ROTATIONS MATRICES

Riemannian Exp and Log maps

Let X = Sx be a vector of T IdSO3. From the above results, we know that γ( Id,X)(t) = exp(t.X)the unique geodesic starting at Id with tangent vector X. The Riemannian exponential map atthe identity for the bi-invariant metric is then defined by:

X ∈ T IdSO3 7→ Exp Id(X) = γ( Id,X)(1) = exp(X) ∈ SO3

According to the above theorem, this function is a diffeomorphism for ‖X‖ = θ < π, and the reverseis given by the

R ∈ SO3 7→ Log Id(R) = Sr ∈ T IdSO3 where r = θ.n is the rotation vector of R = R(n, θ).

Geodesics starting at any other points are defined by left (or right) translation. For instance,γ(R,R.X)(t) = R. exp(t.X) is the unique geodesic starting at R with tangent vector Y = R.X, whichcan be rewritten γ(R,Y )(t) = R. exp(t.RT.Y ). This corresponds to the following reasonning: to findthe geodesic starting at R with tangent vector Y , we first translate (R, Y ) on the left by Rtrp,wich give ( Id,RT.X), take the geodesic at that point, and left translate back its result by R. Sincethe metric is by invariant, the same mechanism can be implemented with right translation. Theformula for the exponential map at any point is thus:

X ∈ TRSO3 7→ ExpR(X) = γ(R,X)(1) = R.Exp Id(RT.X) = R. exp(RT.X) = exp(X.RT).R ∈ SO3

Likewise, to compute the log map of rotation U at rotation R, we first left translate both rotationsby RT, take the log map of RT.U at Id, and left translate back to result by R:

U ∈ SO3 7→ LogR(U) = R.Log Id(RT.U) = R.Sx ∈ TRSO3 where x = θ.n is the rotation vector of RT.U.

A similar expression is obtained using the right translation, and both are equal thanks to bi-invariance.

Exercise 3.6 Exp and log maps

• Compute Exp map at Id: ExpRotId(dR)

• Verify that the exponential of any tangent vector at identity corresponds to the matrix exponential.

• Compute Log Map at Id LogRotId(dR)

• Verify that the log at identity of a random rotation corresponds to the matrix logarithm.

• Compute Exp and Log maps at any rotation R using the left invariance

• Verify that the construction using the right invariance give the same result and the consistency of Exp

• Verify the consistency of Exp(R, Log(R,U)) for random rotations and the one of Log(R, Exp(R, X))for tangent vectors X that are within the cut locus.

3.2 Additional literature

There is a large literature on 3D rotations. From a mathematical point of view, [Alt86], [Kan90,chap. 3 & 6] and [Fau93] give quite exhaustive syntheses. Most of the algorithmic parts of thischapter are extracted from [Pen96, PT97].

Page 39: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Chapter 4

Simple statistics on Riemannianmanifolds

The Riemannian metric induces an infinitesimal volume element on each tangent space, and thusa reference measure dM(p) on the manifold that can be used to measure random elements onthe manifold (generalization of random variables): without entering into the details of measuretheory, such an element is characterized by its probability measure dP (p). Its probability densityfunction (pdf) is the function ρ such that dP (p) = ρ(p)dM(p), if it exists. The induced measuredM actually represents the notion of uniformity according to the chosen Riemannian metric. Thisautomatic derivation of the uniform measure from the metric gives a rather elegant solution to theBertrand paradox for geometric probabilities [Poi12, KM63]. This paradox proposes three differentand equally acceptable ways to compute the probability that the length of a ”random chord” ona cirle is greater than the side of an inscribed equilateral triangle, which lead to a probability of1/2, 1/3 and 1/4. All methods are correct but actually rely on different and incompatible uniformmeasures. The canonical definition of the uniform measure by the Riemannian metric preventssuch a paradox to appear in our Riemannian setting.

With the probability measure dP of a random element, we can integrate any function f(p)from the manifold to any vector space, thus defining the expected value of this function E [ f ] =∫M f(p).dP (p). This notion of expectation corresponds to the one we defined on real random

variables and vectors. However, we cannot directly extend it to define the mean value of thedistribution since we generally cannot integrate manifold-valued functions. Thus, one cannot definethe mean or expected “value” of a random manifold element that way.

4.0.1 First statistical moment: the mean

As one cannot define the mean or expected “value” of a random manifold element using a weightedsum or an integral as usual, several alternative definitions based on properties on the usual meanwere proposed (see [Pic94] and [Pen06a, Sec. 4.3] for a review). The most interesting ones for generalgeodesically complete Riemannian manifolds were proposed by Frechet, Karcher and Emery.

One solution is to rely on a distance-based variational formulation: the Frechet [Fre44, Fre48](resp. Karcher [Kar77]) expected features minimize globally (resp. locally) the variance:

σ2(q) =

∫dist(p, q)2 dP (p) =

1

n

n∑i=1

dist(pi, q)2,

written respectively in the continuous and discrete forms. One can generalize the variance to a

39

Page 40: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

40 CHAPTER 4. SIMPLE STATISTICS ON RIEMANNIAN MANIFOLDS

dispersion at order α by changing the L2 with an α-norm: σα(p) = (∫

dist(p, q)αdP (p))1/α. Theminimizers are called the central Karcher values at order α. For instance, the median is obtainedfor α = 1 and the modes for α = 0, exactly like in the vector case. It is worth noticing thatthe median and the modes are not unique in general in the vector space, and that even the meanmay not exists (e.g. for heavy tailed distribution). In Riemannian manifolds, the existence anduniqueness of all central Karcher values is generally not ensured as they are obtained through aminimization procedure. However, [Kar77] and [Ken90] were able to established existence anduniqueness theorems for distributions with a compact and small enough support. These theoremswere then extended in [Dar96] to distributions with non-compact support in a very specific class ofmanifolds that includes the Hadamard manifolds1 whose curvature is bounded from below. Thisdoes not include rigid body transformations, but this includes the manifold of tensors. For a finitenumber of discrete samples at a finite distance of each other (which is the practical case in statistics)a mean value always exists (the variance is finite everywhere in a complete space so there exists aminimizer). and it is unique as soon as the distribution is sufficiently peaked.

Emery [EM91] proposed to use the exponential barycenters, i.e. the points at which the meanis null in the local exponential chart :

∫M−→xy dP (y) = 0. If the support of the distribution is

included in a strongly convex open set2, he showed that the exponential barycenters were the criticalpoints of the variance. They are thus a superset of the Riemannian centers of mass that includethemselves the Frechet means. Showing that these two notions continue to be essentially equivalentfor distributions with a larger support is more difficult in the presence of a cut locus. Indeed, thedistance is continuous but not differentiable at cut locus points where several minimizing geodesicmeets. For instance, the distance from a point of the sphere to its antipodal point is maximalalthough the directional derivatives of the distance at this points are non zero in all directions.Thus, although the variance is continuous, it might not be differentiable everywhere. We showed in[Pen99, Pen06a] that it is actually differentiable at all points where the variance is finite and wherethe cut locus has a null mass for the considered probability measure3. In that case, its gradient is:

∇σ2(q) = −2

∫−→qp dP (p) =

−2

n

n∑i=1

−→qpi

respectively in the continuous (probabilistic) and discrete (statistical) formulations.When we have a positive mass on the cut-locus, the right hand side of this equation is obviously

not defined: the variance is continuous but can have a sharp extremum (most probably a maximum).Thus, the extrema of the Riemannian variance are exponential barycenters or points with

P (C(y)) > 0. Thus, apart from the specific problems with masses on the cut-locus, we havethe implicit characterization of Karcher mean points as exponential barycenters which was pre-sented in Table 2.1. Similar results have been derived independently in [OC95], where it is assumedthat the probability is dominated by the Riemannian measure (which explicitly excludes point-massdistributions and the case P (C(y)) > 0), and in [BP02, BP03] for simply connected Riemannianmanifolds with non-positive curvature. Our proof extends this result to any kind of manifold. Basi-cally, the characterization of the Riemannian center of mass is the same as in Euclidean spaces if thecurvature of manifold is non-positive (and bounded from below), in which case there is no cut-locus.

1Simply connected and complete manifolds with non-positive sectional curvature2Here, strongly convex means that for every two points there is a unique minimizing geodesic joining them that

depend in a C∞ of the two points.3Notice that this is always the case when the random element has a density with respect to the Riemannian

measure, but this does unfortunately not include the discrete (statistical) formulation where the probability measureis the sum of point masses at sample locations.

Page 41: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

41

If the sectional curvature becomes positive, a cut locus may appear, and a non-zero probability onthis cut-locus induces some discontinuities in the first derivative of the variance. This correspondsto something like a Dirac measure on the second order derivative, which is an additional difficultyto compute the exact Hessian matrix of the variance on these manifolds. In practice, the gradient iswell defined for discrete samples as soon as there is no sample lying exactly on the cut-locus of thecurrent test point. Of course, perturbing the point position solves the problem (locally), but thismight turn out to be a problem for designing certain certified computational algorithmic procedureif the same point is not perturbed exactly the same at different times.

Picard [Pic94] realized a good synthesis of most of these notions of mean value and show thatthe definition of a “barycenter” (i.e. a mean value) is linked to a connector, which determines itselfa connection, and thus possibly a metric. An interesting property brought by this formulationis that the distance between two barycenters (with different definitions) is of the order of O(σx).Thus, for sufficiently concentrated random points, all these values are close.

4.0.2 A Newton algorithm to compute the mean

To effectively compute the mean value, we proposed in [?, Pen98] a Gauss-Newton gradient descentalgorithm on rotations and rigid-body motions. This algorithm was readily extended to generalRiemannian manifolds in [Pen99, Pen06a] by approximating the variance using a Taylor expansionin a normal coordinate system: for a vector field v ∈ TqM, we have

σ2(expq(v)) = σ2(q) +

⟨−−−→∇σ2(q) | v

⟩q

+ 12Hess σ2(v, v) +O(‖v‖2q)

The gradient of the variance being a vector field, the second order derivative (the Hessian) isobtained using the connection. However, we know that the gradient is not continuous at the cutlocus. To circumscribe this problem, one can split the integral into one part that does not takeinto account the cut locus, which gives us a perfect positive definite matrix (2 times the identity),and one part that account for the cut locus, which can be expressed using integrals of Jacobi fields[Kar77]. For a toy example on the circle, see also [Pen06a]. Deliberately neglecting this secondterm gives us a perfectly concave “second order approximation” with the following Gauss-Newtoniterative scheme:

pt+1 = Exppt

(1

n

n∑i=1

−−→ptpi

).

This algorithm essentially alternates the computation of the barycenter in the exponential chartcentered at the current estimation of the mean value, and a re-centering step of the chart at thepoint of the manifold that corresponds to the computed barycenter (geodesic marching step). Inpractice, we found that this algorithm was very efficient and typically converges in 5 to 10 iterationsto the numerical accuracy of the machine for rotations, rigid body transformations and positivedefinite symmetric matrices. Notice that it converges toward the real mean in a single step in avector space. One can actually show that the convergence of this type of Newton iteration is locallyquadratic around non degenerated critical points [OW00, MM02, DMP03].

4.0.3 Covariance matrix and Principal Geodesic Analysis

Once the mean point is determined, using the exponential chart at the mean point is particularlyinteresting as the random feature is represented by a random vector with null mean in a star-shapeddomain. However, one important difference with the Euclidean case is that the reference measure

Page 42: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

42 CHAPTER 4. SIMPLE STATISTICS ON RIEMANNIAN MANIFOLDS

is not the Lebesgue one but the pull-back of the Riemannian measure dM by the Exponential mapat the mean point. With this representation, there is no difficulty to define the covariance matrix(respectively continuous and discrete forms):

Σ =

∫−→pq.−→pqT dP (q) =

1

n

n∑i=1

−→pqi.−→pqi

T

and potentially higher order moments. This covariance matrix can then be used to defined theMahalanobis distance between a random and a deterministic feature that basically weights thedistance between the deterministic feature and the mean feature using the inverse of the covariancematrix: µ(p,Σ)(q) =

−→pqTΣ(-1)−→pq. Interestingly, the expected Mahalanobis distance of a random

element with itself is independent of the distribution and is equal to the dimension of the manifold,as in the vector case. This statistical distance can be used as a basis to generalize some statisticaltests such as the Mahalanobis D2 test [Pen06a].

To analyze the results of a set of measurements in a Euclidean space, one often performs aprincipal component analysis (PCA). A generalization to Riemannian manifolds called PrincipalGeodesic Analysis (PGA) was proposed in [FJLP03] to analyze shapes based on the medial axisrepresentations (M-reps). The basic idea is to find a low dimensional sub-manifold generated bysome geodesic subspaces that best explain the measurements (i.e. such that the squared Rieman-nian distance from the measurements to that sub-manifold is minimized). Another point of viewis to assume that the measurements are generated by a low dimensional Gaussian model. Esti-mating the model parameters amounts to a covariance analysis in order to find the k-dimensionalsubspace that best explains the variance. In a Euclidean space, these two definitions correspondthanks to Pythagoras’s theorem. However, in the Riemannian setting, geodesic subspaces are gen-erally not orthogonal due to the curvature. Thus, the two notions differ: while the Riemanniancovariance analysis can easily be performed in the tangent space of the mean, finding Riemanniansub-manifolds turns out to become an almost intractable problem. As a matter of fact, the solutionretained by [FJLP03] was finally to rely on the covariance analysis.

When the distribution is unimodal and sufficiently peaked, we believe that covariance analysis isanyway much better suited. However, for many problems, the goal is rather to find a sub-manifoldon which measurements are more or less uniformly distributed. This is the case for instance forfeatures sampled on a surface or points sampled along a trajectory (time sequences). While theone dimensional case can be tackled by regression [DFBJ07], the problem for higher dimensionalsub-manifolds remains quite open. Some solutions may come from manifold embedding techniquesas exemplified for instance in [Bru07].

4.0.4 Robust statistics using ϕ-connectors

When there are some outliers in the data, one often use robust statistical estimators to avoid largeerrors. What is interesting in Riemannian manifolds is that we can often see an extrinsic distance(the distance in an embedding space) as a robust version of the Riemannian distance. In thatcase, the extrinsic mean is actually a robust estimator of the mean, which might be particularlyinteresting from a computational point of view.

Let us considered more specifically M-estimators [RL87] of the distance dϕ(p, q) = ϕ((dist(p, q))with ϕ(0) = 0 and ϕ′ decreasing monotonically from ϕ′(0) = 1 while remaining non negatives.These conditions ensure that dϕ remains a distance which is equivalent to the Riemannian one forsmall distances, while giving less weight to points that are far away by tempering their distance.

Page 43: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

43

One can show [GP02] that using such a ϕ-function amounts to replace in the computation of themean the tangent vector −→pq by the vector:

ψ(−→pq) =ϕ(‖−→pq‖p)‖−→pq‖p

−→pq =ϕ(dist(p, q))

dist(p, q)−→pq = −→pq +O(‖−→pq‖2)

This mapping constitute a (convex) connector in the sense of [Pic94]: it formalizes a relationshipbetween the manifold and its tangent space at point p, exactly in the way we used the exponen-tial map of the Riemannian metric. Thus, we could think of defining mean values, higher ordermoments and other statistical operations by replacing everywhere the Riemannian logarithmic andexponential map with their ϕ-equivalent.

For instance, one can verify that ‖ψ(−→pq)‖p = dϕ(p, q). This show that the ϕ-variance of arandom point σ2

ϕ(p) = E[d2ϕ(q, p)

]=∫M ‖ψ(−→pq)‖2p dP (q) is properly defined. Likewise, one can

define the ϕ-covariance Σϕ(p) = E[ψ(−→pq).ψ(−→pq)t

], which trace is still equal to the ϕ-variance. This

ϕ-variance can be differentiated at the points where the cut-locus has a null probability measure(because the ϕ-distance is dominated by the Riemannian distance), and we obtain:

∇σ2ϕ(p) = −2

∫Mϕ′(‖−→pq‖p) ψ(−→pq) dP (q).

This formula is interesting as it shows the divergence of the different notions of mean: the ϕ-center of mass is a weighted barycenter both in the Riemannian and in the ϕ exponential charts, butit is generally different from the (unweighted) ϕ-exponential barycenter. With convex connectors,the different notions of means are not any more subsets of each other. However, there are interestingfrom a computational point of view.

Let us consider first the unit vectors of Rn. The Euclidean metric induces on the sphere Sn−1

a rotationally invariant Riemannian metric for which geodesics are great circles, and the distancebetween two unit vectors u and v is the angle θ = d(u, v) = arccos(ut.v). The Euclidean metricdE(u, v) = ‖u − v‖ can be considered as a ϕ estimator with ϕ(θ) = 2 sin(θ/2). With the help ofa Lagrange multiplier, one easily computes that the extrinsic Euclidean mean is the renormalizedEuclidean mean u =

∫u dP (u)/‖

∫u dP (u)‖, which is thus a robust estimator of the Riemannian

mean!

For directions, a quite used encoding is the tensor u.ut, which may be seen as an immersionof the projective space Pn−1 into the vector space of n× n matrices (Rn2

). With this embedding,the squared extrinsic Euclidean distance (renormalized to be consistent with the previous ones) isd2(u, v) = 1

2‖u.ut − v.vt‖2 = 1− (utv)2 = sin2(θ). This is also a robust distance with ϕ(θ) = sin(θ)

(for θ ≤ π). In the tensor space, the encoding of a random direction is the random tensor Tu =E[u.ut

]. One should notice that the mean direction is represented by the tensor u.ut which is

closest to Tu in the Euclidean sense: this is the eigenvector(s) of Tu corresponding to the largesteigenvalue. One can also show that the ϕ-covariance of the direction is given directly by therestriction of the tensor to the hyperplane orthogonal to the first eigenvector [GP02]. Thus, therandom tensor encodes not only for a robust estimation of the Riemannian mean but also for (anapproximation) of the second order moments.

Simulations were run on a large number of cases to measure the relative accuracy of the vectorand tensor estimations with respect to the Riemannian mean. Up to a variance of 20 degrees, thethree methods have a similar accuracy and results are almost not distinguishable. Between 20 and40 degrees of variance, the tensor estimation becomes different from the two others while keepinga comparable global accuracy. After 40 degrees, the accuracy of the tensor mean highly degrades;

Page 44: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

44 CHAPTER 4. SIMPLE STATISTICS ON RIEMANNIAN MANIFOLDS

the vector mean becomes different from the Riemannian means while keeping for a while a similaraccuracy.

A very similar analysis can be done with 3D rotations: one can also model two well knownextrinsic methods to compute the mean as ϕ-connectors. The first method is to represent rotationsusing unit quaternions, and to compute the renormalized Euclidean mean on the sphere of unitquaternions. As rotation quaternions are defined up to their signs, one theoretically needs toiterate this process and to re-orient the unit quaternions at each step in the hemisphere chosento represent the mean in order to converge. This method amounts to consider the ϕ-distancedquat(θ) = 4 sin(θ/4). The second method is to average the rotation matrices directly in the 3× 3matrix space. Then, the mean is “renormalized” by looking for the rotation matrix which is closestto this result in the Euclidean matrix distance (Froebenius) sense. This can be easily realizedusing a SVD decomposition on the mean matrix. This method amounts to consider the ϕ-distancedmat(θ) = 2 sin(θ/2).

Simulation experiments were performed for the two extrinsic methods by [DE97] in a registrationcontext, and later on for the mean with the three methods by [Gra01]. Like for unit directions/ori-entations, estimation results were similar up to 40 degrees of variance in the input rotations. Theseexperiments showed that efficient extrinsic approximations can be designed and used in practice.In this process, the intrinsic theory is the central piece to compare and to control the validity limitsof these approximations.

4.0.5 Gaussian and χ2 law

Several generalizations of the Gaussian distribution to Riemannian manifolds have already be pro-posed so far. In the stochastic calculus community, one usually consider the heat kernel ρ(p, q, t),which is the transition density of the Brownian motion [Gre63, Eme89, Gri06]. This is the smallestpositive fundamental solution to the heat equation ∂f

∂t −∆f =0, where ∆ is the Laplace-Beltramioperator (i.e. the standard Laplacian with corrections for the Riemannian metric). On compactmanifolds, an explicit basis of the heat kernel is given by the spectrum of the manifold-Laplacian(eigenvalues λi with associated eigenfunctions fi solutions of ∆f = λf). However, the explicitcomputation of this spectrum is impossible but in very few cases [GHL93].

To obtain tractable formulas, several alternative distributions have been proposed in directionalstatistics [Bin74, JM89, Ken92, Mar99, MJ00], in particular the wrapped Gaussian distributions.The basic idea is to take the image by the exponential of a Gaussian distribution on the tangentspace centered at the mean value (see e.g. [MJ00] for the circular and spherical case). It is easyto see that the wrapped Gaussian distribution tends toward the mass distribution if the variancegoes to zero. In the circular case, one can also show that is tends toward the uniform distributionfor a large variance. This definition was extended in [OC95] by considering non-centered Gaussiandistributions on the tangent spaces of the manifold in order to tackle the asymptotic properties ofestimators. In this case, the mean value is generally not any more simply linked to the Gaussianparameters. In view of a computational theory, the main problem is that the pdf of the wrappeddistributions can only be expressed if there is a particularly simple geometrical shape of the cut-locus. For instance, considering an anisotropic covariance on the n-dimensional sphere leads to verycomplex calculations.

Instead of keeping a Gaussian pdf in some tangent space, we propose in [Pen06a, Pen99, ?] a newvariational approach which is consistent with the previous definitions of the mean and covariance.The property that we took as axiom is that the Gaussian distribution maximizes the entropy amongall distributions when we know the mean and the covariance matrix. In the Riemannian setting,

Page 45: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

45

we defined the intrinsic entropy as the expectation of the logarithm of the intrinsic pdf:

H[ρ] = −∫M

log(ρ(p)) ρ(p) dM(p) = −∫M

log(ρ(p)) dP (p)

Our definition of the entropy is consistent with the measure inherited from the Riemannian metricsince the pdf that maximizes the entropy when we only know that the result is in a compact set Uis the uniform density in this set: pU (p) = IU (p)

/∫U dM(p) .

The intrinsic pdf maximizing this entropy knowing the mean x and the covariance matrix Σ isa Gaussian distribution on the exponential chart centered at the mean point and truncated at thecut locus (if there is one)4 [Pen06a]:

N(p,Γ)(q) = k exp

(−1

2

−→pqT Γ

−→pq

).

However, the relation between the concentration matrix (the “metric” Γ used in the exponentialof the probability density function) and the covariance matrix Σ is more complex than the simpleinversion of the vectorial case, as it has to be corrected for the curvature of the manifold. Usinga Taylor expansion of the Riemannian measure, one can obtain computationally tractable approx-imations for any manifold in case of small variances: Let r = i(M, x) be the injectivity radius atthe mean point, i.e. the shortest distance to the cut-locus (by convention r = +∞ if there is nocut-locus). Assuming a finite variance for any concentration matrix Γ, we have the following Taylorexpansions:

k =1 +O(σ3) + ε (σ/r)√

(2π)n det(Σ)and Γ = Σ(-1) − 1

3Ric +O(σ) + ε (σ/r)

Here, ε(x) is a function that is a O(xk) for any positive k (more precisely, this is a function suchthat ∀k ∈ R+, lim0+ x

−k ε(x) = 0).This family of distributions ranges from the point-mass distribution (for Γ =∞) to the uniform

measure (i.e. uniform density for compact manifolds) for a null concentration matrix. For sometheoretical reasons (including the non-differentiability of the pdf at the cut locus), this is probablynot be the best generalization of the Gaussian. However, from a practical point of view, it provideseffective and computationally tractable approximations for any manifold in case of small variancesthat we were not able to obtain from the other definitions.

Based on this generalized Gaussian, we investigated in [Pen06a, Pen99, ?] a generalization of theχ2 law to manifolds by considering the Mahalanobis distance of a Gaussian random feature. In thesame conditions as for the Gaussian, one can show that is has the same density as in the vectorialcase up to an order 3 in σ. This opens the way to the generalization of many other statistical tests,as we may expect similarly simple approximations for sufficiently centered distributions.

4The definition domain of the exponential map at the mean point has to be symmetric to obtain this result. This isthe case in particular for symmetric spaces, i.e. Riemannian spaces which metric are invariant under some symmetry.

Page 46: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

46 CHAPTER 4. SIMPLE STATISTICS ON RIEMANNIAN MANIFOLDS

Page 47: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Chapter 5

The Example of Covariance matrices

Symmetric positive definite (SPD) matrices, called tensors in medical image analysis, are used forinstance to encode the covariance matrix of the Brownian motion (diffusion) of water in DiffusionTensor Imaging (DTI) [BMB94, LBMP+01], to encode the joint variability at different places inshape analysis [FAP+05, FAP+07, FPTA08], and in image analysis to guide the segmentation,grouping and motion analysis [MLT00, WB02, BWBM06, WH06]. They are also used in manyother application domains. For instance, they are a common tool in numerical analysis to locallydrive the size of the adaptive meshes in order to optimize the cost of solving PDEs in 3D [MBG97].In the formation of echo-Doppler or radar images, Teoplitz Hermitian positive definite matricesuniquely characterized circular complex random processes with a null mean [MSH05].

The main computational problem is that the SPD space is a manifold that is not a vectorspace with the usual additive structure. As the positive definiteness constraint delimits a convexhalf-cone in the vector space of symmetric matrices, convex operations (like the mean) are stablein this space but problems arise with more complex operations. For instance, there is inevitably apoint in the image where the time step is not small enough when smoothing fields of tensors withgradient descents, and this results into negative eigenvalues. Even when a spectral decompositionis performed to smooth independently the rotation (eigenvectors basis trihedron) and eigenvalues[Tsc02, CTDF02], there is a continuity problem around equal eigenvalues.

To cope with that problem in the context of Diffusion Tensor Images (DTI), it was proposedconcurrently by several authors to endow the space of SPD matrices with a Riemannian metricinvariant by any change of the underlying space coordinates, i.e. invariant under the action ofaffine transformations of covariance matrices. This led to the distance distance dist2(Σ,Λ) =Tr(log(Σ−1/2ΛΣ−1/2)2

)where exp and log stand for the matrix logarithm.

This metric leads to a very regular Hadamard manifold structure (an hyperbolic space withoutcut-locus) which simplifies the computations. Tensors with null and infinite eigenvalues are both atan infinite distance of any SPD matrix: the cone of symmetric positive definite matrices is changedinto a space of ”constant” (homogeneous) non-scalar curvature without boundaries. Moreover, thereis one and only one geodesic joining any two SPD matrices, the mean of a set of SPD matricesis uniquely defined, and we can even define globally consistent orthonormal coordinate systems oftangent spaces. Thus, the structure we obtain is very close to a vector space, except that the spaceis curved.

The invariant metric has been independently proposed in [FJ04] for the analysis of principalmodes of sets of diffusion tensors; in [Moa05] for its mathematical properties which were exploitedin [BMA+05] for a new anisotropic DTI index; and in [PFA06] as the basis for building the com-plete computational framework on manifold-valued images that will be presented in Chapter 7.

47

Page 48: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

48 CHAPTER 5. THE EXAMPLE OF COVARIANCE MATRICES

By looking for a suitable metric on the space of Gaussian distributions for the segmentation ofdiffusion tensor images, [LRDF06] also end-up with the same metric. It is interesting to see thatcompletely different approaches, relying on invariance requirements on the one hand, and relyingon an information measure to evaluate the distance between distributions on the other hand, leadto the same metric on the SPD space.

The metric has been also previously introduced in statistics as the Fisher information metricto model the geometry of the multivariate normal family [BR82, Sko84, CO91] and is consideredas a well known result in other branches of mathematics [Bha03]. In computer vision, it wasrediscovered to deal with covariance matrices [FM99]. An implicit form was introduced in [HM94]for developing flows and dynamic systems on the space of symmetric matrices. The correspondingintegrator (which corresponds to a geodesic walking with this Riemannian metric) was used for theanisotropic regularization of diffusion tensor images in [CTDF04] and [Bie04].

By trying to put a Lie group structure on the tensor manifold, Vincent Arsigny came outlater with Log-Euclidean metrics [AFPA06, ?, AFPA05]. These metrics give a vector space spacestructure to this manifold while keeping most of its interesting properties (Section 5.4), thus sim-plifying drastically the algorithms and speeding computations. Other non Riemannian metrics orparameterizations were also proposed. We briefly present them for completeness and we turn inSection 5.5.1 to an important question raised by all these possible choices: how to chose the metricdepending on the nature and natural properties of the data that we need to process? A tentativeapplication of our Riemannian metrics to structure tensors [FAAP05] showed for instance that thereis no universal metric for a given manifold: there are different families of metrics with similar ordifferent characteristics, and one may rely on one or the other depending on the specificities of theapplication.

5.1 Exponential, logarithm and square root of tensors

In the following, we will make an extensive use of a few functions on symmetric matrices. Theexponential of any matrix can be defined using the series exp(A) =

∑+∞k=0

Ak

k! . In the case ofsymmetric matrices, we have some important simplifications. Let W = UDUT be a diagonalization,where U is an orthonormal matrix, and D = DIAG(di) is the diagonal matrix of the eigenvalues.We can write any power of W in the same basis: W k = U Dk UT. This means that we may factorout the rotation matrices in the series and map the exponential individually to each eigenvalue:

exp(W ) =+∞∑k=0

W k

k!= U DIAG(exp(di)) U

T.

The series defining the exponential function converges for any matrix argument, but this isgenerally not the case for the series defining its inverse function: the logarithm. However, any SPDmatrix can be diagonalized into Σ = U DIAG(di) U

T with strictly positive eigenvalues di. Thus,the function

log(Σ) = U (DIAG(log(di)))UT

is always well defined on tensors. Moreover, if all the eigenvalues are small enough (|di − 1| < 1),then the series defining the usual log converges and we have:

log(Σ) = U

(DIAG

(+∞∑k=1

(−1)k+1

k(di − 1)k

))UT =

+∞∑k=1

(−1)k+1

k(Σ− Id)k. (5.1)

Page 49: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

5.1. EXPONENTIAL, LOGARITHM AND SQUARE ROOT OF TENSORS 49

Classically, one defines the (left) square root of a matrix B as the set B1/2L = A ∈

GLn / A AT = B. One could also define the right square root: B1/2

R = A ∈ GLn / AT A = B.For SPD matrices, we define the square root as:

Σ1/2 = Λ ∈ Sym+n / Λ2 = Σ.

The square root is always defined and moreover unique: let Σ = U D2 UT be a diagonalization(with positives values for the di’s). Then Λ = U DUT is of course a square root of Σ, which provesthe existence. For the uniqueness, let us consider two symmetric and positive square roots Λ1 andΛ2 of Σ. Then, Λ2

1 = Σ and Λ22 = Σ obviously commute and thus they can be diagonalized in the

same basis: this means that the diagonal matrices D21 and D2

2 are equal. As the elements of D1

and D2 are positive, they are also equal and Λ1 = Λ2.

More generally, one can define any power of a SPD matrix by taking the power of the eigenvaluesor using the formula:

Σα = exp (α(log Σ)) .

5.1.1 Differential of the exponential

The matrix exponential and logarithm realize a one-to-one mapping between the space of sym-metric matrices to the the space of tensors. Moreover, one can show that this mapping is dif-feomorphic, since the differential has no singularities. Using the Taylor expansion (W + εV )k =W k + ε

∑k−1i=0 W

i V W k−i−1 +O(ε2) for k ≥ 1, we obtain by identification the directional derivative∂V exp(W ) by gathering the first order terms in ε in the series exp(W+εV ) =

∑+∞k=0 (W + εV )k/k!:

∂V exp(W ) = (d exp(W ))(V ) =

+∞∑k=1

1

k!

k−1∑i=0

W i V W k−i−1 (5.2)

For simplifying the differential, we can see that using the diagonalization W = RS RT in the seriesgives:

∂V exp(W ) = R ∂(RT V R) exp(S)RT

Thus, we are left with the computation of ∂V exp(S) for S diagonal. As [SlV Sk−l−1]ij = slivijsk−l−1j ,

we have [∂V exp(S)]ij =∑+∞

k=11k!

∑k−1l=0 s

li sk−l−1j

vij = qij vij with

qij =+∞∑k=1

1

k!

k−1∑l=0

sli sk−l−1j =

+∞∑k=1

1

k!

ski − skjsi − sj

=exp(si)− exp(sj)

si − sj

= exp(sj)

(1 +

(si − sj)2

+(si − sj)2

6+O

((si − sj)3

))The last Taylor expansion shows that this formula is computationally well posed. Moreover, wehave qij ≥ 1 > 0, so that we can conclude that d exp(S) is a diagonal linear form that is alwaysinvertible: the exponential is a diffeomorphism.

5.1.2 Differential of the logarithm

To compute the differential of the logarithm function, we do not have a series that we could perturblike for the exponential, but we can simply inverse the differential of the exponential as a linear

Page 50: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

50 CHAPTER 5. THE EXAMPLE OF COVARIANCE MATRICES

form: as exp(log(Σ)) = Σ, we have (d log(Σ)(V ) = (d exp(log(Σ))(-1)V . Using D = exp(S), theinverse is easily expressed for a diagonal matrix: [(d exp(S))(-1) V ]ij = vij/qij . Thus we have:

[∂V log(D)]ij = vijlog(di)− log(dj)

di − dj

Notice that

q(-1)ij =log(di)− log(dj)

di − dj=

1

dj

(1− di − dj

2 dj+

(di − dj)2

3 d2j

+O((di − dj)3

))

so that the formula is numerically stable. Finally, using the identity log(Σ) = RT log(R Σ RT) Rfor any rotation R, we have:

∂V log(RD RT) = R (∂RT V R log(D) )RT

That way, we may compute the differential at any point Σ = RD RT.

5.1.3 Remarkable identities

∂log(Σ) log(Σ) = Σ(-1) log(Σ) = log(Σ) Σ(-1) (5.3)

〈 ∂V log(Σ) |W 〉 = 〈 ∂W log(Σ) | V 〉 (5.4)

∂log(Σ) log(Σ) = R(∂log(D) log(D)

)RT = R Diag(log(di)/di)R

T = Σ(-1) log(Σ)

〈 ∂V log(Σ) |RD RT 〉 = Tr ((∂RT V R log(D) )RT W R)

= [RT V R]ijlog(di)− log(dj)

di − dj[RT W R]ij

= Tr ((∂RT W R log(D) )RT V R) = 〈 ∂W log(Σ) | V 〉

Exercise 5.1 Powers, exponential and logarithm of symmetric matrices

• Implement the function ExpSym(M) that compute the exponential of a symmetric matrix using diag-onalization.

• Implement a random SPD matrix generator RndSym() thanks to ExpSym.

• Implement the function LogSym(M) that compute the logarithm of a SPD matrix using diagonaliza-tion.

• Implement the function SqrtSym(M) that compute the unique SPD square root of a SPD matrix

• Implement the function InvSqrtSym(M) that compute the inverse of the SPD square root of a SPDmatrix

• Verify on random symmetric matrices that SqrtSym(M) = ExpSym(0.5*LogSym(M)) and SqrtSym(M)= ExpSym(-0.5*LogSym(M)).

Page 51: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

5.2. AFFINE INVARIANT METRICS ON THE SPD SPACE 51

5.2 Affine invariant metrics on the SPD space

Action of the linear group on covariance matrices

Let us consider the following action of the linear group GLn on the tensor space Sym+n :

A ? Σ = AΣAT ∀A ∈ GLn and Σ ∈ Sym+n .

This group action corresponds for instance to the standard action of the affine group on thecovariance matrix Σxx of a random variables x in Rn: if y = Ax + t, then y = Ax + t andΣyy = E[(y − y) (y − y)T] = AΣxxA

T.This action is naturally extended to tangent vectors is the same way: if Γ(t) = Σ + tW +O(t2)

is a curve passing at Σ with tangent vector W , then the curve A?Γ(t) = AΣAT + tAWAT +O(t2)passes through A ? Σ with tangent vector A ?W .

5.2.1 Looking for an affine invariant distance

Following [PA98], any invariant distance on Sym+n verifies dist(A ? Σ1, A ? Σ2) = dist(Σ1,Σ2).

Choosing A = Σ−1/21 , we can reduce this to a pseudo-norm, or distance to the identity:

dist(Σ1,Σ2) = dist

(Id,Σ

− 12

1 Σ2Σ− 1

21

)= N

(Σ− 1

21 Σ2Σ

− 12

1

).

Moreover, as the invariance has to hold for any transformation, N should be invariant under theaction of the isotropy group H( Id) = On = U ∈ GLn / UUT = Id:

∀U ∈ On, N(U Σ UT) = N(Σ).

Using the spectral decomposition Σ = UD2UT, it is easy to see that N(Σ) has to be a symmetricfunction of the eigenvalues. Moreover, the symmetry of the distance dist(Σ, Id) = dist( Id,Σ)imposes that N(Σ) = N(Σ(-1)). Thus, a good candidate is the sum of the squared logarithms ofthe eigenvalues:

N(Σ)2 = ‖ log(Σ)‖2 =n∑i=1

(log(σi))2. (5.5)

This ”norm” verifies by construction the symmetry and positiveness. N(Σ) = 0 implies thatσi = 1 (and conversely), so that the separation axiom is verified. However, we do no know any

simple proof of the triangle inequality, which should read N(Σ1) + N(Σ2) ≥ N(Σ−1/21 Σ2Σ

−1/21 ),

even if we can verify it experimentally (see e.g. [FM99]).

5.2.2 An invariant Riemannian metric

Another way to determine the invariant distance is through the Riemannian metric. For that, weneed to define the differential structure of the manifold.

Tangent vectors

Tangent vectors to SPD matrices are simply symmetric matrices with no constrain on the eigen-values: if Γ(t) = Σ + t.W +O(t2) is a curve on the SPD space, the tangent vector W is obviouslysymmetric, and there acnnot be any other constraint as symmetric and SPD matrice both have thesame dimension n.(n+ 1)/2.

Page 52: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

52 CHAPTER 5. THE EXAMPLE OF COVARIANCE MATRICES

The group action naturally extends to them: if W is a tangent vector at Σ, then AW AT is atangent vector at A ΣAT. One should be careful that although the tangent space seems to be thesame at all SPD point Σ because they have the same representation as symmetric matrices, thespaces TΣSym+

n are still different as they have a different foot Σ in the SPD space.

Riemannian metric

Let us take the most simple scalar product on the tangent space at the identity matrix: if W1 andW2 are tangent vectors (i.e. symmetric matrices, not necessarily definite nor positive), we definethe scalar product to be the standard matrix scalar product 〈W1 |W2 〉 = Tr(WT

1 W2). This scalarproduct is obviously invariant by the isotropy group On. Now, if W1 and W2 are two tangentvectors at Σ, we require their scalar product to be invariant by the action of any transformation:〈W1 |W2 〉Σ = 〈A ?W1 |A ?W2 〉A?Σ. This should be true in particular for A = Σ−1/2, whichallows us to define the scalar product at any Σ from the scalar product at the identity:

〈W1 |W2 〉Σ =⟨

Σ−12W1Σ−

12

∣∣∣ Σ−12W2Σ−

12

⟩Id

= Tr(

Σ−12W1Σ−1W2Σ−

12

).

One can easily verify that this definition is left unchanged if we use any other transformationA = U Σ−1/2 (where U is a free orthonormal matrix) that transports Σ to the identity: A ? Σ =A ΣAT = U UT = Id.

Geodesics

To find the geodesic without going though the computation of Christoffel symbols, we may relyon a result from differential geometry [Gam91, Hel78, KN69] which says that the geodesics forthe invariant metrics on affine symmetric spaces are generated by the action of the one-parametersubgroups of the acting Lie group1. Since the one-parameter subgroups of the linear group aregiven by the matrix exponential exp(t A), geodesics on our tensor manifold going through Σ withtangent vector W should have the following form:

Γ(Σ,W )(t) = exp(t A) Σ exp(t A)T with W = A Σ + ΣAT. (5.6)

For our purpose, we need to relate explicitly the geodesic to the tangent vector in order todefine the exponential chart. Since Σ is a symmetric matrix, there is hopefully an explicit solutionto the Sylvester equation W = A Σ + ΣAT. We get A = 1

2

(W Σ(-1) + Σ1/2 Z Σ−1/2

), where Z is a

free skew-symmetric matrix. However, introducing this solution into the equation of geodesics (Eq.5.6) does not lead to a very tractable expression. Let us look at an alternative solution.

Since our metric (and thus the geodesics) is invariant under the action of the group, we canfocus on the geodesics going through the origin (the identity). In that case, a symmetric solutionof the Sylvester equation is A = 1

2W , which gives the following equation for the geodesic goingthrough the identity with tangent vector W :

Γ( Id,W )(t) = exp

(t

2W

)exp

(t

2W

)T

= exp(t W ).

We may observe that the tangent vector along this curve is the parallel transportation of theinitial tangent vector. If W = U DIAG(wi) U

T,

dΓ(t)

dt= U DIAG (wi exp(t wi)) U

T = Γ(t)12 W Γ(t)

12 = Γ(t)

12 ? W.

1To be mathematically correct, we should consider the quotient space Sym+n = GL+

n /SOn instead of Sym+n =

GLn/On so that all spaces are simply connected.

Page 53: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

5.2. AFFINE INVARIANT METRICS ON THE SPD SPACE 53

By definition of our invariant metric, the norm of this vector is constant: ‖Γ(t)1/2 ?W‖2Γ(t)1/2? Id

=

‖W‖2Id = ‖W‖22. This was expected since geodesics are parameterized by arc-length. Thus, thelength of the curve between time 0 and 1 is

L =

∫ 1

0

∥∥∥∥dΓ(t)

dt

∥∥∥∥2

Γ(t)

dt = ‖W‖2Id.

Solving for Γ( Id,W )(1) = Σ, we obtain the “norm” N(Σ) of Eq.(5.5). Using the invariance of ourmetric, we easily obtain the geodesic starting from any other point of the manifold using our groupaction:

Γ(Σ,W )(t) = Σ12 ? Γ( Id,Σ−1/2?W)(t) = Σ

12 exp

(t Σ−

12WΣ−

12

12 .

Coming back to the distance dist2(Σ, Id) =∑

i(log σi)2, it is worth noticing that tensors with

null eigenvalues are located as far from the identity as tensors with infinite eigenvalues: at theinfinity. Thanks to the invariance by the linear group, this property holds for the distance to any(positive definite) tensor of the manifold. Thus, the original cone of positive definite symmetricmatrices (a linear manifold with a flat metric but which is incomplete: there is a boundary at afinite distance) has been changed into a regular and complete (but curved) manifold with an infinitedevelopment in each of its n(n+ 1)/2 directions.

Riemannian Exponential and log maps

As a general property of Riemannian manifolds, geodesics realize a local diffeomorphism from thetangent space at a given point of the manifold to the manifold: Γ(Σ,W )(1) = expΣ(W ) associates toeach tangent vector W ∈ TΣSym+

n a point of the manifold. This mapping is called the exponentialmap, because it corresponds to the usual exponential in some matrix groups. This is exactly ourcase for the exponential map around the identity:

exp Id(UDUT) = exp(UDUT) = U DIAG (exp(di)) U

T.

However, the Riemannian exponential map associated to our invariant metric has a more complexexpression at other tensors:

expΣ(W ) = Σ12 exp

(Σ−

12WΣ−

12

12 .

In our case, this diffeomorphism is global, and we can uniquely define the inverse mapping every-where:

logΣ(Λ) = Σ12 log

(Σ−

12 ΛΣ−

12

12 .

Thus, expΣ gives us a collection of one-to-one and complete maps of the manifold, centered atany point Σ. As explained in Section 2.2.5, these charts can be viewed as the development of themanifold onto the tangent space along the geodesics. Moreover, as the manifold has a non-positivecurvature [Sko84], there is no cut-locus and the statistical properties detailed in [Pen04] hold intheir most general form. For instance, we have the existence and uniqueness of the mean of anydistribution with a compact support [Ken90].

Page 54: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

54 CHAPTER 5. THE EXAMPLE OF COVARIANCE MATRICES

Induced and orthonormal coordinate systems

One has to be careful because the coordinate system of all these charts is not orthonormal. Indeed,the coordinate system of each chart is induced by the standard coordinate system (here the matrix

coefficients), so that the vector−→ΣΛ corresponds to the standard derivative in the vector space of

matrices: we have Λ = Σ +−→ΣΛ + O(‖

−→ΣΛ‖2). Even if this basis is orthonormal at some points of

the manifold (such as at the identity for our tensors), it has to be corrected for the Riemannianmetric at other places due to the manifold curvature.

From the expression of the metric, one can observe that

‖−→ΣΛ‖2Σ = ‖ logΣ(Λ)‖2Σ = ‖Σ−

12 logΣ(Λ)Σ−

12 ‖2Id = ‖ log(Σ−

12 ? Λ)‖22.

This shows that−→ΣΛ⊥ = log(Σ−

12 ?Λ) ∈ TΣSym+

n is the expression of the vector−→ΣΛ in an orthonor-

mal basis. In our case, the transformation Σ1/2 ∈ GLn is moreover uniquely defined (as a positive

square root) and is a smooth function of Σ over the complete tensor manifold. Thus,−→ΣΛ⊥ realizes

an atlas of orthonormal exponential charts which is globally smooth with respect to the referencepoint2 Σ. This group action approach was chosen in earlier works [Pen96, PT97, PA98] with whatwe called the placement function.

For some statistical operations, we need to use a minimal representation (e.g. 6 parameters for3 × 3 tensors) in a (locally) orthonormal basis. This can be realized through the classical “Vec”operator that maps the element ai,j of a n × n matrix A to the (i n + j)th element Vec(A)i n+j

of a n × n dimensional vector Vec(A). Since we are working with symmetric matrices, we haveonly n(n+ 1)/2 independent coefficients, say the upper triangular part. However, the off-diagonalcoefficients are counted twice in the L2 norm at the identity: ‖W‖22 =

∑ni=1w

2i,i + 2

∑i<j≤nw

2i,j .

Thus, to express our minimal representation in an orthonormal basis, we need to multiply the offdiagonal terms by

√2:

Vec Id(W ) =(w1,1,

√2 w1,2, w2,2,

√2 w1,3,

√2 w2,3, w3,3, . . .

√2 w1,n, . . .

√2 w(n−1),n, wn,n

)T

.

Now, for a vector−→ΣΛ ∈ TΣSym+

n , we define its minimal representation in the orthonormal coordi-nate system as:

VecΣ(−→ΣΛ) = Vec Id(

−→ΣΛ⊥) = Vec Id

(Σ−

12−→ΣΛ Σ−

12

)= Vec Id

(log(Σ−

12 ? Λ)

).

The mapping VecΣ realizes an explicit isomorphism between TΣSym+n and Rn(n+1)/2 with the

canonical metric.

5.2.3 Simple statistical operations on tensors

Computing the mean

Let Σ1 . . .ΣN be a set of measures of the same Tensor. The Karcher or Frechet mean is the set oftensors minimizing the sum of squared distances: C(Σ) =

∑Ni=1 dist2(Σ,Σi). In the case of tensors,

the manifold has a non-positive curvature[Sko84], so that there is one and only one mean value Σ[Ken90]. Moreover, a necessary and sufficient condition for an optimum is a null gradient of the

2On most homogeneous manifolds, this can only be realized locally. For instance, on the sphere, there is asingularity at the antipodal point of the chosen origin for any otherwise smooth placement function.

Page 55: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

5.3. THE ONE-PARAMETER FAMILY OF AFFINE-INVARIANT METRICS 55

criterion. Thus, the intrinsic Newton gradient descent algorithm gives the following mean value atestimation step t+ 1:

Σt+1 = expΣt

(1

N

N∑i=1

logΣt(Σi)

)= Σ

12t exp

(1

N

N∑i=1

log

(Σ− 1

2t ΣiΣ

− 12

t

))Σ

12t . (5.7)

Note that we cannot easily simplify this expression further as in general the data Σi and the meanvalue Σt cannot be diagonalized in a common basis. However, this gradient descent algorithmusually converges very fast (about 10 iterations, see Fig. ??).

Covariance matrix

the empirical covariance matrix of a set of N tensors Σi of mean Σ is defined using the tensor

product: 1N−1

∑ni=1

−−→ΣΣi ⊗

−−→ΣΣi. Using our Vec mapping, we may come back to more usual matrix

notations and write its expression in a minimal representation with an orthonormal coordinatesystem:

Cov =1

N − 1

N∑i=1

VecΣ

(−−→ΣΣi

)VecΣ

(−−→ΣΣi

)T

.

One may also define the Mahalanobis distance

µ2(Σ,Cov)(Σ) = VecΣ

(−→ΣΣ)T

Cov(-1) VecΣ

(−→ΣΣ).

Looking for the probability density function that minimizes the information with a constrainedmean and covariance, we obtain a generalization of the Gaussian distribution of the form:

NΣ,Γ(Σ) = k exp

(−1

2µ2

Σ,Γ(Σ)

).

The main difference with a Euclidean space is that we have a curvature to take into account:the invariant measure induced on the manifold by our metric is linked to the usual matrix measureby dM(Σ) = dΣ/ det(Σ). Likewise, the curvature slightly modifies the usual relation between thecovariance matrix, the concentration matrix Γ and the normalization parameter k of the Gaussiandistribution [Pen04]. These differences have an impact on the calculations using continuous prob-ability density functions. However, from a practical point of view, we only deal with a discretesample set of measurements, so that the measure-induced corrections are hidden. For instance,we can generate a random (generalized) Gaussian tensor using the following procedure: we samplen(n+ 1)/2 independent and normalized real Gaussian samples, multiply the corresponding vectorby the square root of the desired covariance matrix (expressed in our Vec coordinate system), andcome back to the tensor manifold using the inverse Vec mapping. Using this procedure, we caneasily generate noisy measurements of known tensors.

5.3 The one-parameter family of affine-invariant metrics

One can question that rise is about the uniqueness of this type of affine-invariant Riemannian metric.We have shown in [Pen06c] that there is actually a one-parameter family of such affine-invariantRiemannian metrics on tensors that all share the same connection. This is the affine invariant

Page 56: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

56 CHAPTER 5. THE EXAMPLE OF COVARIANCE MATRICES

connection on homogeneous spaces of [Nom54] which is used in many theorems on symmetricspaces in many differential geometry textbooks [KN69, Hel78, Gam91].

An affine-invariant dot product obviously verifies 〈 V |W 〉Σ = 〈A V AT |AW AT 〉A ΣAT . Inparticular, this should be true for the isotropy group of the identity (the linear transformationsthat leave the identity matrix unchanged: the rotation matrices). All the rotationally invariant dotproducts on symmetric matrices are given (up to a constant global multiplicative factor) by:

〈 V |W 〉 Id = Tr(V W ) + β Tr(V ) Tr(W ) with β > − 1

n

where n is the dimension of the space. These metrics are derived from rotationally invariant norms‖W‖2, which are quadratic form on (symmetric) matrices. By isotropy, they can only depends onthe matrix invariants Tr(W ), Tr(W 2) and Tr(W 3). However, as the form is quadratic in W , weare left only with Tr(W )2 and Tr(W 2) that can be weighted by α and β. One easily verifies thatβ > −α/n is a necessary and sufficient condition to ensure positive definiteness.

This metric at the identity can then be transported at any point by the group action using the(symmetric or any other) square root Σ1/2 considered as a group element:

〈 V |W 〉Σ =⟨

Σ−1/2V Σ−1/2∣∣∣ Σ−1/2WΣ−1/2

⟩Id

= Tr(V Σ−1WΣ−1

)+ β Tr

(V Σ−1

)Tr(WΣ−1

)The Riemannian distance is obtained by integration, or more easily by the norm of the initialtangent vector of the geodesic joining the two points:

dist2(Σ,Λ) = ‖ logΣ(Λ)‖2Σ = ‖Σ−1/2 logΣ(Λ)Σ−1/2‖2Id= Tr

(log(Σ−1/2ΛΣ−1/2)2

)+ βTr

(log(Σ−1/2ΛΣ−1/2)

)2

It is worth noticing that tensors with null eigenvalues are at an infinite distance of any regulartensor, as are tensors with infinite eigenvalues: the original cone of positive definite symmetricmatrices, a linear manifold with a flat but incomplete metric (there is a boundary at a finitedistance) has been changed into a regular and complete (but curved) manifold with an infinitedevelopment in each of its n(n+ 1)/2 directions.

For β = −1/(n + 1), we have the metric that [LMO00] proposed by embedding the space oftensors of dimension n into the space of n+ 1 square matrices using homogeneous coordinates (thisallows them to seamlessly take into account an addition position that represent the mean of theGaussian distribution), and by quotienting out n+ 1 dimensional rotations. The same trick couldbe used to embed the space in higher dimensional spaces (square matrices of dimension n+ p+ 1,in which case one would obtain the invariant metric with β = −1/(n + p + 1). Interestingly,−1/β = n+ 1 is the first authorized integer to obtain a proper metric!

In fact, one can show that all the metrics of this affine-invariant have the same Levy-Civitaconnection ∇VW = ∇WV = −1/2(V Σ−1W +WΣ−1V ) [Sko84], which means that they share thesame geodesics and the Riemannian exp and log maps at each point:

ExpΣ(W ) = Σ1/2 exp(

Σ−1/2WΣ−1/2)

Σ1/2

LogΣ(Λ) = Σ1/2 log(

Σ−1/2ΛΣ−1/2)

Σ1/2

However, one should be careful that the orthonormal bases are different for each metric whichmeans that distances along the geodesics are different.

Page 57: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

5.4. LOG-EUCLIDEAN METRICS 57

From the connection, one can compute the curvature tensor of the manifold [Sko84]R(X,Y, V,W ) = 1/4Tr(Y Σ−1XΣ−1V Σ−1W − XΣ−1Y Σ−1V Σ−1W ). From this tensor, one getsthe sectional curvature and see that it is non positive and bounded from below (by -1/2). Thus, itis Hadamard manifold, i.e. a kind of hyperbolic space in which we have for instance the existenceand uniqueness of the mean. There is also no cut-locus, which simplifies the computations.

5.4 Log-Euclidean metrics

By trying to put a Lie group structure on the space of tensors, Vincent Arsigny observed thatthe matrix exponential was a diffeomorphism from the space of symmetric matrices to the tensorspace. However, this well-known fact in mathematics was apparently never used to transport allthe operations defined in the vector space of symmetric matrices to the tensor space, thus providingthe tensor space with a commutative Lie group structure and even with a vector space structure[AFPA06, AFPA07]. For instance, the composition (the log-product) is defined by Σ1 Σ2 =exp(log(Σ1) + log(Σ2)). The Euclidean metric on symmetric matrices is transformed into a bi-invariant Riemannian metric on the tensor manifold (i.e. a metric which is invariant by both leftand right translations in the Lie group).

〈 V |W 〉Σ = 〈D log(Σ)(V ) |D log(Σ)(W )〉 Id = 〈 ∂V log(Σ) | ∂W log(Σ)〉 Id .

As geodesics are straight lines in the space of symmetric matrices, the expression of the Exp,Log and distance maps for the Log-Euclidean metric is easily determined:

ExpΣ(W ) = exp(log(Σ) + ∂W log(Σ))

LogΣ(Λ) = D exp(log(Σ)) (log(Λ)− log(Σ))

dist2LE(Σ1,Σ2) = Tr

((log(Σ1)− log(Σ2))2

)One can see that geodesics through the identity are the same as for affine-invariant metrics, butthis is not true any more in general at other points of the tensor manifold [AFPA07].

Log-Euclidean metrics are intrinsically invariant by a change of scale and by inversion. More-over, by choosing the scalar product invariant by rotation at the identity (i.e. ‖W‖2Id =Tr(W 2) + β Tr(W )2 with β > −1/n as above), we get all the similarity invariant log-Euclideanmetrics:

dist2LE(Σ1,Σ2) = ‖ log(Σ1)− log(Σ2)‖2Id = Tr

((log(Σ1)− log(Σ2))2

)+ β Tr (log(Σ1)− log(Σ2))2

By adding the logarithmic scalar multiplication λ ? Σ = exp(λ log(Σ)) = Σλ, we get a completestructure of vector space on tensors.

These formulas look more complex than for the affine invariant metric because they involvethe differential of the matrix exponential and logarithm in order to transport tangent vectors fromone space to another. However, they are in fact nothing but the transport of the addition andsubtraction through the exponential of symmetric matrices. In practice, the log-Euclidean frame-work consist in taking the logarithm of the tensor data, computing like usual in the Euclideanspace of symmetric matrices, and coming back at the end to the tensor space using the exponential[AFPA06, AFPA05]. This means that most of the operations that were generalized using mini-mizations for the affine-invariant metric do have a closed-form with a log-Euclidean metric. Forinstance, the log-Euclidean mean is simply:

ΣLE = exp

(1

n

n∑i

log(Σi)

)

Page 58: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

58 CHAPTER 5. THE EXAMPLE OF COVARIANCE MATRICES

while the affine-invariant mean is obtain through the iterative algorithm:

Σt+1 = Σ12t exp

(1

n

n∑i=1

log

(Σ− 1

2t ΣiΣ

− 12

t

))Σ

12t .

This last expression can only be simplified if the Σi’s commute with the mean value Σt. Interestingly,the affine-invariant and log-Euclidean means are identical if the mean commutes with all the data.When they are not equal, one can show that (close enough to the identity) the log-Euclidean meanis slightly more anisotropic [AFPA07].

A careful comparison of both metrics in practical applications [AFPA05, AFPA06] showed thatthere was very few differences on the results (of the order of 1%) on real DTI images, but that thelog-Euclidean computations where 4 to 10 times faster. Thus, for this specific type of application,the log-Euclidean framework seems to be best suited. For other types of applications, like adaptivere-meshing [MBG97], the anisotropy of the tensors can be much larger, which may lead to largerdifferences. In any case, initializing the iterative optimizations of affine-invariant algorithms withthe log-Euclidean result drastically speeds-up the convergence.

5.5 Other families of metrics

Other families of metrics were also proposed to work with positive definite symmetric matrices,especially in view of processing diffusion tensor images. For instance, [WVCM04] proposed toparameterize tensors by their Cholesky decomposition Σ = LLT where L is upper triangular withpositive diagonal entries. Taking the standard flat Euclidean metric on the (positive) upper diagonalmatrices leads to straight line geodesics in that space which are then transported to the tensor spaceas for the log-Euclidean framework. This lead to tensor space structure which is obviously flat, butwhere the null eigenvalues are at a finite distance, like in the Euclidean case.

Other square roots might be used to define other metrics on tensors as Σ = (LR)(LR)T is alsoa valid decomposition for any rotation R. For instance, the symmetric square root UΛ1/2UT leadto a well defined metric on tensors which has similar properties as the Cholesky metric above, yethaving different geodesics. The fact that the rotation R can be freely chosen to compute the squareroot recently led Dryden to propose a new metric in [DKZ08] which basically measure the shortestdistance between all the square roots L1R1 of Σ1 and L2R2 of Σ2. The minimal distance is realizedby the Procrustes match of the square roots:

dist(Σ1,Σ2) = minR∈O(n)

‖L2 − L1R‖

and the optimal rotation R = UV T is obtained thanks to the singular value decomposition ofLT

2L1 = USV T. This metric is in fact the standard Kendall metric on the reflection size-and-shapespace of n+ 1 points in dimension n [DKZ08, DM98, Sma96], which geometry is well known. Forinstance, the minimal geodesic joining Σ1 to Σ2 is given by

Σ(t) =(

(1− t)L1 + tL2R)(

(1− t)L1 + tL2R)T

From the equation of the geodesics, one can derive the Riemannian exp and log map and proceedwith the general computing framework. However, one must be careful that this space is not completeand has singularities when the matrix Σ has rank n− 2, i.e. when 2 eigenvalues are going to zero[LK93].

Page 59: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

5.5. OTHER FAMILIES OF METRICS 59

In [WV05], Wang and Vemuri proposed to use the square root of the J-divergence (the sym-metrized Kullback Leibler divergence) as a ”distance”3 on the tensor space:

dist2J(Σ1,Σ2) = Tr

(Σ1Σ(-1)

2 + Σ2Σ(-1)

1

)− 2n

This J-distance has interesting properties: it is affine invariant, and the Frechet mean value of aset of tensors Σi has a closed form solution:

Σ = B−1/2(B1/2AB1/2

)1/2A−1/2

with A =∑

i Σi and B =∑

i Σ(-1)

i . However, this is not a Riemannian distance as a Taylorexpansion

dist2J(Σ,Σ + εV ) =

ε2

2Tr(Σ(-1)V Σ(-1)V ) +O(ε3)

indicates that the underlying infinitesimal dot product is the usual affine invariant metric〈 V |W 〉 = 1/2TrΣ(-1)V Σ(-1)W . In fact, this metric might well be an extrinsic metric, like theone we investigated in Section 4.0.4 for unit vectors and rotations, and it would be interesting todetermine what is the embedding space.

5.5.1 Which metric for which problem?

Affine-invariant [PFA06, PFA04] and Log-Euclidean metrics seem to be well adapted for DTIs andcovariance matrices: they provide a geometric interpolation for which the fractional anisotropy(FA) is quasi linear; Null eigenvalues are at an infinite distance of any tensor, so that there is norisk to reach them in a finite time and gradient descents are well posed; The affine-invariant metricsgives to te tensor manifolds a Hadamard structure (a hyperbolic space with non-positive curvaturewhich is diffeomorphic to Rn) while the Log-Euclidean ones give it a complete euclidean structure;With both metrics, the mean always exists and is unique; etc.

Thus, it seems that these metrics could fit many problems. In [FAAP05], we tried to applythem to the smoothing of structure tensors. These tensors are classically obtained by convolvingthe tensor product of the gradient of an image by a Gaussian: Sσ = Gσ ? (∇I.∇It). They revealstructural information about the image and are used to detect edges and corner points or to drivesome anisotropic filtering methods. The noisier the image is, the higher σ must be to obtain asmooth field, but small structures may be wiped out. By contrast, smaller values of σ can help toextract low level features in images, but the resulting structure tensor field may be noisy. Thus, theidea is to smooth anisotropically this tensor field to regularize homogeneous regions while preservingedges. An anisotropic filtering of the coefficients (Euclidean structure) fails because we end up withnegative eigenvalues.

Thus, We proposed to minimize a ϕ-function of the affine-invariant norm of the gradient of thetensor field (see 6.3). However, the results clearly showed that this method also fails: Since theRiemannian metric is scale invariant, small differences in small tensors have the same importance aslarge differences in large ones. This is catastrophic here because the anisotropic diffusion enhancethese small details as much as the large scale ones. Moreover, the anisotropic smoothing does notallows to reach tensors with only one non-null eigenvalue which would represent perfect infiniteedges! For this type of tensors, it seems that we need to find another metric for which the nulleigenvalues should be reachable, but not the negative ones.

3Quotation marks indicate here that the triangular inequality might not be verified.

Page 60: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

60 CHAPTER 5. THE EXAMPLE OF COVARIANCE MATRICES

The Cholesky decomposition proposed by [WVCM04] may proved to be the right structure forthat and it would be interesting to see if it corresponds to a Riemannian metric. As an example, weillustrate in Fig. 5.1 the value of the tensors along a geodesic starting with the same tangent vectorfor the Euclidean metric, the log-Euclidean metric (the affine-invariant is very close), and a straightline in the Cholesky parameterization proposed by [WVCM04]. This last curve in the tensor spaceis given by Σ(t) = L(t).L(t)t where L(t) = t.L0 and L0 is a lower triangular matrix (with positiveelements on the diagonal) solution of Σ0 = L0.L

t0 + Lt0.L0. As could be be expected, one quickly

reach the boundary of the space in the Euclidean case (tensors with non-positive eigenvalues arenot displayed). In the log-Euclidean case, the size of the tensors exponentially decrease but actuallyreach a null eigenvalue only asymptotically. The Cholesky parameterization behave like the squarefunction: it reaches a null eigenvalue in a finite time but then bounce back into the space of tensors.

Figure 5.1: Geodesic shooting from the tensor Σ = diag(4, 1) with the tangent vectorΣ = diag(−8). Left: The Euclidean case. 2/3 of tensors are not positive definite and thus arenot displayed. Middle: The Cholesky case: the null matrix is reached (exact middle value) andvalues beyond are the mirrored versions of the previous ones. Right: The Log-Euclidean case: alltensors are positive definite, and the null tensor is never reached.

The message of this experiment is that there is not universal metric for one type of features:there are different families of metrics with similar or different characteristics, and one may rely onone or the other depending on the specificities of the application.

Page 61: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Chapter 6

Computing with Manifold-valuedimages

The previous section showed how to derive from the atomic Exp and Log maps many importantstatistical notions, like the mean, covariance and Principal Component Analysis (PCA). We nowturn to the generalization of some image processing algorithms like interpolation, diffusion andrestoration of missing data (extrapolation) to manifold-valued images. We show that most interpo-lation and filtering methods can be reformulated using weighted means. The linear and non-lineardiffusion schemes can be adapted to Manifolds through PDEs, provided that we take into accountthe variations of the metric. For details, we refer the reader to [PFA06].

6.1 Interpolation and filtering as weighted means

One of the important operations in geometric data processing is to interpolate values between knownmeasurements. In 3D image processing, (tri-) linear interpolation is often used thanks to its verylow computational load and comparatively much better results than nearest neighbor interpolation.Other popular methods include the cubic and, more generally, spline interpolations [TBU00, Mei02].The standard way to interpolate on a regular lattice is to make a linear combination of samples fk atinteger (lattice) coordinates k ∈ Zd: f(x) =

∑k w(x−k)fk. A typical example is the sinus cardinal

interpolation where the convolution kernel has an infinite support. With the nearest-neighbor, linear(or tri-linear in 3D), and higher order spline interpolations, the kernel is piecewise polynomial, andhas a compact support [TBU00, Mei02]. With normalized weights, this interpolation can be seenas a weighted mean. Thus, it can be generalized in the manifold framework as an optimizationproblem: the interpolated value p(x) on our feature manifold is the point that minimizes C(p(x)) =∑n

i=1wi(x) dist2(pi, p(x)). This can easily be solved using the iterative Gauss-Newton schemeproposed for the Karcher mean. The linear interpolation is interesting and can be written explicitlysince it is a simple geodesic walking scheme: p(t) = Expp0(t−−→p0p1) = Expp1((1− t)−−→p1p0).

For our SPD matrices example, this gives the following interpolation with the standard Eu-clidean, Log-Euclidean and Affine-invariant metrics give:

ΣEucl(t) = (1− t) Σ1 + t Σ2

ΣLE(t) = exp ((1− t) log(Σ1) + t log(Σ2))

ΣAff (t) = Σ1/21 exp

(t log

(Σ−1/21 Σ2 Σ

−1/21

))Σ

1/21

61

Page 62: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

62 CHAPTER 6. COMPUTING WITH MANIFOLD-VALUED IMAGES

One can show that the volume of the tensors (the determinant) is geometrically interpolated withthe Log-Euclidean and Affine invariant metrics (their logarithm is linearly interpolated) while thetrace of the tensor is linearly interpolated for Euclidean metric [?].

6.1.1 Gaussian and kernel-based filtering

Many other operators can be rephrased as weighted means. For instance approximations andconvolutions like Gaussian filtering can be viewed as the average of the neighboring values weightedby a (Gaussian) function of their spatial distance. For instance, F (x) =

∫Rn K(u) F (x + u) du is

the minimizer of C(F ) =∫Rn K(u) dist2(F (x+ u), F (x)) du. In this formulation the kernel can be

a discrete measure, for instance if samples are defined on the points of a grid. In a Riemannianmanifold, this minimization problem is still valid, but instead of a closed-form solution, we haveonce again a Gauss-Newton iterative gradient descent algorithm to reach the filtered value:

pt+1(x) =

∫Rn

K(u) Logpt(x)(p(x+ u)) du.

An example of a comparative Gaussian filtering of a diffusion tensor image with the flat Eu-clidean metric and the affine-invariant one is provided in Fig. 6.1. The log-Euclidean result is verysimilar to the affine-invariant result (the difference has to be multiplied by 100 to be visible). Onecan see a more important blurring of the corpus callosum fiber tracts using the flat metric.

We can also use anisotropic and non-stationary kernels K(x, u). For instance, it can bemodulated by the norm of the derivative of the field in the direction u. We should noticethat for a manifold-value field p(x), the directional derivatives ∂up(x) is a tangent vector ofTp(x)M which can be practically approximated using finite “differences” in the exponential chart:∂up(x) ' Logp(x)(p(x + u)) + O(‖u‖2). However, to measure the norm of this vector, we have touse the Riemannian metric at that point: ‖∂up‖p.

6.2 Harmonic diffusion

An alternative to kernel filtering is to consider a regularization criterion that penalizes the spatialvariations of the field. A measure of variation is the spatial gradient (the linear form that maps toany spatial direction u the directional derivative ∂up(x)). It is thus a tensor in the mathematicalsense. In a given coordinate system, it may be which can be robustly computed as the matrix thatbest approximates the directional derivatives in the neighborhood (e.g. 6, 18 or 26 connectivity in3D). The simplest criterion based on the gradient is the Harmonic energy

Reg(p) =1

2

∫Ω‖∇p(x)‖2p(x) dx =

1

2

d∑i=1

∫Ω‖∂xip(x)‖2p(x) dx.

The Euler-Lagrange equation of this Harmonic regularization criterion with Neumann boundaryconditions is as usual ∇Reg(p)(x) = −∆p(x). However, the Laplace-Beltrami operator on themanifold ∆p(x) is the sum of the usual flat Euclidean second order directional derivatives ∂2

xip(x)in a locally orthogonal system and an additional term due to the curvature of the manifold thatdistorts the ortho-normality of this coordinate system in the neighborhood. To practically computethis operator, we proposed in [PFA06] an efficient and general scheme based on the observationthat the Christoffel symbols and their derivatives along the geodesics vanish at the origin of theexponential chart. This means that the correction for the curvature is in fact already included: by

Page 63: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

6.2. HARMONIC DIFFUSION 63

Figure 6.1: Results of the different filtering methods on a 3D DTI of the brain (Closeup around thesplenium of the corpus callosum) The color codes for the direction of the principal eigenvector (red: left-right, green: posterior-anterior, blue: inferior-superior). Top left: Original image. Top right: Gaussianfiltering using the flat metric (5x5 window, σ = 2.0). This metric gives too much weight to tensors with largeeigenvalues, thus leading to clear outliers in the ventricles or in the middle of the splenium tract. Bottomleft: Gaussian filtering using the Riemannian metric (5x5 window, σ = 2.0). Outliers disappeared, but thediscontinuities are not well preserved, for instance in the ventricles at the level of the cortico-spinal tracts(upper-middle part of the images). Bottom right: Anisotropic filtering in the Riemannian framework (timestep 0.01, 50 iterations). The ventricles boundary is very well conserved with an anisotropic filter and bothisotropic (ventricles) and anisotropic (splenium) regions are regularized. Note that the U-shaped tracts atthe boundary of the grey/white matter (lower left and right corners of each image) are preserved with ananisotropic filter and not with a Gaussian filter.

Page 64: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

64 CHAPTER 6. COMPUTING WITH MANIFOLD-VALUED IMAGES

computing the standard Laplacian in that specific map, one gets the directional Laplace-Beltramioperator for free: ∆up = Logp(x)(p(x + u)) + Logp(x)(p(x − u)) + O(‖u‖4). Averaging over all thedirections in a spatial neighborhood V finally gives a robust and efficient estimation scheme:

∆p(x) ∝∑u∈V

1

‖u‖2Logp(x)(p(x+ u))

A very simple scheme to perform Harmonic diffusion is to use a first order geodesic gradientdescent. At each iteration and at each point x, one walks a little bit along the geodesic whichstart at the current point with the opposite of the gradient of the regularization criterion pt+1(x) =Exppt(x)

(−ε∆pt(x)

).

6.3 Anisotropic diffusion

In order to filter within homogeneous regions but not across their boundaries, an idea is to penalizethe smoothing in the directions where the derivatives are important [PM90, GKKJ92]. This can berealized directly in the discrete implementation of the Laplacian by weighting the directional Lapla-cian with a decreasing function of the norm ‖∂up‖p of the gradient in that direction. For instance,we used ∆up =

∑u c(‖∂up‖p) ∆up with c(x) = exp

(−x2/κ2

)in [PFA06]. As the convergence of

this scheme is not guaranteed (anisotropic regularization “forces” may not derive from a well-posedenergy), the problem may be reformulated as the optimization of a ϕ-function of the Riemanniannorm of the spatial gradient (a kind of robust M-estimator): Regϕ(p) = 1

2

∫Ω ϕ

(‖∇p(x)‖p(x)

)dx.

By choosing an adequate ϕ-function, one can give to the regularization an isotropic or anisotropicbehavior [AK01]. The main difference with a classical Euclidean calculation is that we have totake the curvature into account by using the Laplace-Beltrami operator, and by measuring thelength of directional derivatives using the Riemannian metric at the right point [FAAP05]. UsingΨ(x) = ϕ′(x)/x, we get:

∇Regϕ(p) = −Ψ(‖∇p‖p)∆p−∑d

i=1 ∂xiΨ(‖∇p‖p)∂xip.

6.4 Diffusion-based interpolation and extrapolation

The pure diffusion reduces the noise in the data but also the amount of information. Moreover, thetotal diffusion time that controls the amount of smoothing is difficult to estimate. At an infinitediffusion time, the field will be completely homogeneous. Thus, it is more interesting to considerthe data as noisy observations and the regularization as a prior on the spatial regularity of the field.Usually, one assumes a Gaussian noise independent at each position, which leads to a least-squarescriterion through a maximum likelihood approach. For a dense data field q(x), the similaritycriterion that is added to the regularization criterion is simply Sim(p) =

∫Ω dist2 (p(x) , q(x)) dx.

The only difference here is that it uses the Riemannian distance. It simply adds a linear (geodesic)spring ∇p dist2(p, q) = −2 −→pq to the global gradient to prevent the regularization from pulling tofar away from the original data.

For sparse measures, using directly the maximum likelihood on the observed data leads to dealwith Dirac (mass) distributions in the derivatives, which is a problem for the numerical implemen-tation. One solution is to consider the Dirac distribution as the limit of the Gaussian function Gσwhen σ goes to zero, which leads to the regularized derivative [PFA06]:

∇Sim(x) = −2

n∑i=1

Gσ(x− xi)−−−→p(x)pi.

Page 65: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Chapter 7

Applications in ComputationalAnatomy

Now that we have seen the generic statistical computing framework on Riemannian manifolds andfamilies of Riemannian metrics that can be designed on the tensor manifold, let us turn to twoimportant applications in medical image analysis and computational anatomy: diffusion tensorimaging, which provides unique in vivo information about the structure of the white matter fibersin the brain, and the estimation of the variability of the cortex among subjects.

7.1 Diffusion tensor imaging

Diffusion tensor Imaging (DTI) is a unique tool to assess in vivo oriented structures within tissuesvia the directional measure of water diffusion. Fiber tracking is the application targeted by mostresearchers in order to investigate non invasively the anatomical-functional architecture of the brain.Most of the current applications are currently in neuroscience, with high signal-to-noise ratios (SNR)images on healthy subjects rather. DTI might also prove to be an interesting quantification tool formedical diagnosis [PMB06, RGB+05]. However, using such a modality in a clinical environment isdifficult: data often have to be acquired quickly because the patient cannot stay in a static positionfor too long due to pathologies. As this prevents the acquisition of multiple images for averaging,this results in a limited number of encoding gradients and low SNR images. The estimation ofthe diffusion tensor field from diffusion weighted images (DWI) being noise-sensitive, clinical DTIis often not suitable for fiber tracking. Thus, one need to regularize the tensor field withoutblurring the transitions between distinct fiber tracts, which delimit anatomical and functional brainregions. Smoothing independently each DWI before estimating the tensor results in a smoothertensor field but it also blurs the transitions between homogeneous regions, as this information is notaccessible by taking each DWI individually. Consequently, one would like to perform an anisotropicregularization of the tensor field itself.

Most of the methods developed so far actually estimate the tensor field in a first phase with asimple algebraic method (see below) and then spatially regularize some of the geometric featuresof the tensor field. We believe that a better idea is to consider a prior on the spatial regularitywhen estimating the tensor field itself so that the estimation could remain statistically optimalwith respect to the DWI noise model and could keep the maximum amount of information fromthe original data. We designed in [FAPA06, FPAA07] a maximum likelihood (ML) criterion for theestimation of tensors fields from DWI with the MRI specific Rician noise (the amplitude of a complex

65

Page 66: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

66 CHAPTER 7. APPLICATIONS IN COMPUTATIONAL ANATOMY

Gaussian signal), and extended it to a maximum a posteriori (MAP) criterion by considering aspatial prior on the tensor field regularity. This results into an algorithm that jointly (rather thansequentially) performs the estimation and the regularization of the tensor field.

The Stejskal-Tanner diffusion equation [BMB94] relates the diffusion tensor D to each noise-freeDWI:

Si = S0 exp(−b gTi Dgi)

where Si is the original DWI corresponding to the encoding gradient gi, S0 the base image witha null gradient, and b the diffusion factor. By taking the logarithm of this equation, one obtaina linear system. Solving that system in a least square (LS) sense leads to the minimization of aquadratic criterion, which is easily performed using algebraic methods (see e.g. [WMM+02]). Doingthis implicitly assumes a log-Gaussian noise on the images, which is justify only for high SNRs.Very few works did consider non log-Gaussian noise because it requires optimization techniques ontensors which are very difficult to control with the standard Euclidean framework. With the log-Euclidean framework, such an optimization is not difficult (one could also restate everything withinthe affine-invariant framework but calculations are slightly more complex). For instance, in the caseof a Gaussian noise on the DWIs, the tensor D = exp(L) is parameterized by its logarithm L, anunconstrained symmetric matrix. The criterion to optimize is SimG(L) =

∑(Si − Si(exp(L)))2,

and the gradient is

∇SimG(L) = 2b∑

(Si − Si).∂LSi with ∂LSi = Si ∂gi.gti exp(L)

For low SNRs, we have to take into account the real nature of the noise in MRI, which is Gaussianin the complex domain (the k-space). [WVCM04] proposed an estimation criterion on the complexDWI signal that is adapted to that noise, with a computationally grounded optimization frameworkbased on the Cholesky decomposition. However, one usually only have access to the amplitude ofthe signal complex signal in clinical images: in that case, the noise is thus Rician. One can show thatsuch a noise induces a signal-dependent bias of the order of σ2/2S on the DWI signal [SdDSD98].The signal being systematically larger than what it ought to be, the tensors will be under-estimated.To take explicitly the nature of this noise into account, we should optimize the log-likelihood ofthe signal corrupted by a Rician noise. This leads to a more complex criterion that above, but itsgradient is very similar to the Gaussian case above: ∇SimR(L) = −1/σ2

∑(Si − αSi)∂LS, except

that we have a correcting factor α = I ′0/I0(SiSi/σ2) depending on the signal and the noise variance

(I0 and I ′0 are computable Bessel functions). The noise variance can easily be estimated on thebackground of the image (outside the head) where there is no signal.

For the spatial regularity, we proposed in [FAPA06, FPAA07] to use a Markovian prior p(Σ(x+dx)|Σ(x)) ∝ exp(−‖∇Σ(x)T.dx‖Σ(x)/λ), and to account for discontinuities using a redescendingM-estimator (a so-called ϕ-functional). In the log-Euclidean framework, the tensor field Σ(x) isparameterized by its logarithm L(x), and the log of the prior is simply: Reg(L) =

∫Ω ϕ (‖∇L‖). In

our experiments, we use ϕ(s) = 2√

1 + s2/κ2− 2. The ϕ-function preserves the edges of the tensorfield while smoothing homogeneous regions. To include this regularization as an a-priori into theML optimization process, we simply need to compute its gradient ∇Reg(L) = −ψ (‖∇L‖) ∆L −∑

i ∂i (ψ (‖∇L‖)) .∂iL where ψ(s) = ϕ′(s)/s. Directional derivatives, gradient and Laplacian wereestimated with a finite differences scheme like with scalar images (see [FAPA06, FPAA07] fordetails).

Experiments on synthetic data with contours and a Rician noise showed that the gradientdescent technique was correctly removing the negative eigenvalues that did appear in the standard(Euclidean log-Gaussian) estimation technique. ML and MAP (with regularization) methods with

Page 67: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

7.2. LEARNING BRAIN VARIABILITY FROM SULCAL LINES 67

a Gaussian noise model were underestimating the volume of tensors even more than the standardlog-Gaussian method (30% instead of 20%), while Rician ML and MAP methods were estimating itwithin 5%. More interestingly, the methods were tested on two clinical datasets of low and mediumquality: a brain image with a very low SNR (Fig. 7.1), and an experimental acquisition of atumor in the spinal chord, both with 7 gradient directions (Fig. 7.2). This last type of acquisitionis currently actively investigated in clinical research (e.g. [FOF+05]). It is difficult to performbecause the position is uncomfortable due to the tumor and the coil cannot be perfectly adaptedto the body as it is for the head. The images are consequently much noisier than for the brain.

Like for synthetic data, using gradient descent techniques removed all the negative eigenvaluesof the standard method. To evaluate the impact of the noise model on the tensor reconstruction inthe brain, we computed the mean apparent diffusion coefficient (ADC), fractional anisotropy (FA)and volume of the diffusion tensors in the ventricles (high but anisotropic diffusion), and in thecorpus callosum (lower diffusion with high anisotropy) [FPAA07]. Using the Rician noise modelincrease the tensor volume and the ADC by about 10% in isotropic regions and by 1 to 2% inanisotropic regions without modifying the FA. In the spinal chord, using the Rician noise modelalso lead to an increase of the tensors of about 30% in volume. This corresponds to the correctionof the shrinking effect with Gaussian and Log-Gaussian noises. Adding some spatial regularization(MAP methods) systematically decreases the FA. However, this effect is much lower for anisotropicregions and minimized with the Rician noise model: 3% only in the corpus callosum (versus 11%with log-Gaussian), and 15% in the ventricles (versus 30% with log-Gaussian). Thus, it seems thatthese measurements are more reproducible with the MAP Rician reconstruction.

The tractography results in a much smoother and longer fibers with less dispersion for the MAPRician model. The overall number of reconstructed fibers is also much larger. The smoothness ofthe tensor field indeed leads to more regular and longer fibers: tracts that were stopped due tothe noise are now fully reconstructed. A careful quantitative evaluation and validation of thewhole framework however remains to be done. In particular, it would be necessary to evaluatethe reproducibility across acquisitions and scanners, for instance using repeated scans of the samesubject, as well as evaluations of physical phantoms.

7.2 Learning Brain Variability from Sulcal Lines

A second interesting application is the statistical modeling of the brain variability in a given pop-ulation of 3D images [FAP+05, FAP+07]. In such a process, the identification of correspondingpoints among each individual anatomy (structural homologies) allows us to encode the brain vari-ability by covariance matrices. The reason why we should not simplify these tensors into simplerscalar values is that there are evidences that structural variations are larger along certain preferreddirections [TMW+01]. Thus the metrics on covariance matrices presented in Section ?? and thestatical computing framework developed in earlier Sections are once again needed.

In the brain, a certain number of sulcal landmarks consistently appear in all normal individualsand allow a consistent subdivision of the cortex into major lobes and gyri [MRC+04b]. Moreover,sulcal lines are low dimensional structures easily identified by neuroscientists. In the framework ofthe associated team program between Epidaure/Asclepios at INRIA and LONI at UCLA1, we usea data-set of sulcal lines manually delineated in 98 subjects by expert neuroanatomists accordingto a precise protocol2. We used the 72 sulcal curves that consistently appear in all normal subjects(abusively called sulci in the sequel).

1http://www-sop.inria.fr/epidaure/Collaborations/UCLA/atlas.html2http://www.loni.ucla.edu/~khayashi/Public/medial_surface/

Page 68: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

68 CHAPTER 7. APPLICATIONS IN COMPUTATIONAL ANATOMY

Figure 7.1: Tensor field estimation of a brain (top row) and improvement of the fibertracking (bottom row). Top Left: A slice of the b0 image. Top Middle: The classic log-Gaussian estimation on the ROI. The color codes for the principal direction of tensors: red: left-right, green: anterior-posterior, blue: inferior-superior. Missing tensors in the splenium regionare non-positive. Top Right: The MAP estimation of the same region. Bottom row, Left: ROIwhere the tracking is initiated. Bottom row, middle: The cortico-spinal tract reconstructed aftera classic estimation. Bottom row, Right: Same tract reconstructed after our MAP estimation.Rendering is obtained using the MedINRIA software developed by P. Fillard and N. Toussaint.

Figure 7.2: Tensor field estimation of the spinal chord. Left: A slice of the b0 imagewith the ROI squared in green. Middle: Classic log-Gaussian ML tensor estimation. Thereare many missing (non-positive) tensors around and in the spinal cord. Right: Rician MAPtensor estimation: tensors are all positive and the field is much more regular while preservingdiscontinuities. Original DWI are courtesy of D. Ducreux, MD. Rendering is obtained using theMedINRIA software (http://www.inria.fr/sophia/asclepios/software/MedINRIA/).

Page 69: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

7.3. CHALLENGES 69

To find the corresponding points between the 98 instances of each of the 72 sulci, we alternativelycomputed the matches that minimize the distance between the mean curve and the instances, andre-estimated the mean curve from the updated matches. As a result, we obtain for each point of eachmean sulcal curve the set of corresponding anatomical positions in each subject. The number oftensor needed to represent the variability information along each sulcus was then adjusted by pickingonly a few tensors alors the mean line and linearly interpolating in-between them. The optimalsubset of tensors is determined by optimizing the distance between interpolated and measuredtensors along the line so that the error does not exceed a prescribed value. In this process, thedistance and interpolation between covariance matrices was performed using the affine-invariantmetric. Interestingly, selecting only 366 variability tensors was sufficient to encode the variabilityof the 72 sulci without a significant loss of accuracy. The result is a sparse field of tensors, that cannaturally be extrapolated to the whole space using the framework described in Section 6.4 (Fig.7.3). This dense map of tensors was shown to be in good agreement with previous published results:the highly specialized and lateralized areas such as the planum parietale and the temporo-parietalareas consistently shows the highest amount of variability. The lowest amount of variability isconsistently found in phylogenetically older areas (e.g. orbitofrontal cortex) and primary corticesthat myelinate earliest during development (e.g., primary somatosensory and auditory cortex).However, our variability map give more than the amount of variability since we can extract fromthe tensors the spatial directions where the variability is the greatest at every single anatomicalposition. We refer the reader to [FAP+05, FAP+07] for a more detailed explanation of the methodand for the neuroscience interpretation of these results.

Figure 7.3: Variability tensor extrapolation. Left: The 366 tensors retained for our model. Right:Result of the extrapolation. Each point of this average brain shape contains a variability tensor.

7.3 Challenges

We have shown in this chapter that the choice of a Riemannian metric and the implementation of afew tools derived from it, namely the Exp and Log maps, provides the bases for building a consistentalgorithmic framework to compute on manifolds. In particular, we can compute consistent statistics,perform interpolation, filtering, isotropic and anisotropic regularization and restoration of missingdata. Last but not least, powerful computational models of the anatomy could be built thanks tothis Riemannian computing framework.

There are however many challenges opened both from the theoretical and application point of

Page 70: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

70 CHAPTER 7. APPLICATIONS IN COMPUTATIONAL ANATOMY

views. For instance, the Riemannian approach that we presented here is not perfectly consistentwith the structure of Lie groups as soon as they are not compact nor Abelian, which is alreadythe case for rigid body transformations. In that case, there is generally no left and right invariantmetric, and most of the operations that we defined (e.g. the mean) with either the left or theright invariant metric are not consistent with inversion. To find an alternative to the Riemannianstructure for Lie groups, we investigate with V. Arsigny the idea on relying on one-parametersubgroups instead of geodesics. Preliminary results indicate that this may provide an interestingstructure [ACPA06b, ACPA06a]. For instance, one can design bi-invariant means that are fullycompatible with the group structure [APA06]. They are define though fixed point equations whichare very similar to the Riemannian ones. However, these equations do not derive from a wellposed metric. It would be interesting to see what part of the statistical computing framework stillholds if we replace the distance by a simple positive or negative energy. This probably amounts toconsidering the connection as the basic structure of the manifold instead of the Riemannian metric.

Another key problem is to extend our statistical computing framework to infinite dimensionalmanifolds such as surfaces and diffeomorphism groups. From a theoretical point of view, we knownhow to provide the diffeomorphism group with left or right invariant Riemannian metrics that aresufficiently smooth to compute the geodesics by optimization [BMTY05, MTY03, MY01, JM00].Through the so called EPDiff equation (Euler-Poincarre equation for diffeomorphisms), this opti-mization framework has been recently rephrased in an exponential/logarithm framework similarto the one developed here [MTY06]. Thus, the basic algorithmic tools are the same, except thatoptimizing each time to compute the exponential and the logarithm has a deep impact on thecomputational times. However, one difficulty is that the infinite number of dimensions forbidsthe use of many tools like the probability density functions! Thus, even if simple statistics likethe mean and the principal component analysis of a finite set of samples may still be computed[VMYT04, DPT+08], one should be very careful about ML-like statistical estimation in these spaces:there is always a finite number of data for an infinite number of parameters. In particular, there areinfinitely many left- or right-invariant metrics on diffeomorphisms, and learning the optimal metricis an ill-posed problem. Estimations need to be regularized with prior models or performed withinfinite dimensional families of metrics whose assumptions are suited for the problem at hand. Aninteresting track for that is to establish specific models of the Green’s function based on the mixtureof smoothly varying local and long-distance interaction convolution kernels. If we only consider thelocal kernel, the Riemannian elasticity [PSA+05, Pen06b] could be an interesting family of metricsallowing to measure statistically the properties of the virtual underlying material. Moreover, it wasrecently shown that such a criterion was consistently separating monozygotic twins from others,which suggest that such deformation-based measures could be anatomically meaningful [LBC+08].

Last but not least, surfaces are an important source of anatomical structires in computationalanatomy, and one need to design efficient methods and metrics to capture their statistical properties.It would also be useful to fuse the information coming from image deformations and from surfacesin a single framework. Courants (generalization of distributions) provide consistent mathematicaltools for discrete and continuous surfaces [CSM03]. A diffeomorphic registration algorithm ofsurfaces based on that notion was proposed for instance in [VG05]. The tools were then drasticallyimproved in [DPT+08, DPTA08b, DPTA08a] to provide the basis of a computationally efficientstatistical computing framework on curves and surfaces. We expect very interesting advances inthis direction in the coming years.

From a computational anatomy standpoint, the huge number of degrees of freedom involved inthe estimation of the anatomical variability will require to aggregate information coming from manydifferent sources in order to improve the statistical power. As there is no gold standard, we should

Page 71: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

7.3. CHALLENGES 71

also be careful that many biases may be hidden in the results. Thus, methods to compare and fusestatistical information coming from many different anatomical features will need to be developedin order to confirm anatomical findings. For the brain variability, one could for instance add to thesulci other cortical landmarks like sulcal ribbons and gyri, the surface of internal structures like theventricles, the hippocampus or the corpus callosum, or fiber pathways mapped from DTI. Thesesources of information are individually providing a partial and biased view of the whole variability.Thus, we expect to observe a good agreement in some areas, and complementary measures in otherareas. This will most probably lead in a near future to new anatomical findings and more robustmedical image analysis applications.

Page 72: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

72 CHAPTER 7. APPLICATIONS IN COMPUTATIONAL ANATOMY

Page 73: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Appendix A

Appendices

A.1 Jacobians of usual scalar and vector functions

Let x = [x1, . . . , xn]T and y = [y1, . . . , yn]T be two vectors of Rn, λ be a scalar, f(x) a scalar functionof the vector x (f : Rn → R) and g(y) = [g1(y), . . . , gm(y)]T a vector function (g : Rn → Rm). TheJacobian matrices of these functions are defined by the matrix functions:

Jf =∂f

∂x=

[∂f

∂x1, . . . ,

∂f

∂xn

]

Jg =∂g

∂x=

∂g1∂x1

. . . ∂g1∂xn

.... . .

...∂gm∂x1

. . . ∂gm∂xn

in such a way that the element (i, j) of Jg is [Jg]ij =

∂gi∂xj

. Then, we have:

f(x+ dx) = f(x) + Jf (x).dx et g(x+ dx) = g(x) + Jg(x).dx

where the dot (.) represent the matrix multiplication.

A.1.1 Composition and multiplication of functions

Let f and g be two vector functions with compatible dimensions for the composition (f : Rn → Rm,g : Rm → Rp) and let h(x) = g(f(x)) be the composed function (h = g f : Rn → Rp). Thecomposition of Jacobians is given by the chain rule (beware that this is not commutative):

Jh(x) = Jg(f(x)).Jf (x)

Let now f be a scalar function and h = f.g the multiplicative function (h(x) = f(x).g(x)). TheJacobian of h is:

Jh(x) = f(x).Jg(x) + g(x).Jf (x)

A.1.2 Function of two variables

Let α(x, y) be a vector function of two vector variables and Jα1 , Jα2 be its two Jacobians:

Jα1(x, y) =∂(α(x, y))

∂xJα2(x, y) =

∂(α(x, y))

∂y

73

Page 74: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

74 APPENDIX A. APPENDICES

This function can be for instance a scalar or a cross product. Let f and g be two vector functionsand h(x) = α(f(x), g(x)) the composed function. The Jacobian of h is:

Jh(x) = Jα1 (f(x), g(x)) .Jf (x) + Jα2 (f(x), g(x)) .Jg(x)

A.1.3 Jacobian of standard functions

Scalar product: 〈 x | y 〉 = 〈 y | x〉 =∑

xi.yi = xT.y = Tr(x.yT)

J〈 . | . 〉1(x, y) =∂ 〈 x | y 〉

∂x= yT

J〈 . | . 〉2(x, y) =∂ 〈 x | y 〉

∂y= xT

Norm: ‖x‖ =√∑

x2i =

√〈 x | x〉

J‖.‖(x) =∂(‖x‖)∂x

=xT

‖x‖= (Normalization(x))T

Normalization: JN (x) =∂

∂x

(x

‖x‖

)=‖x‖2.In − x.xT

‖x‖3

Square norm:∂‖x‖2

∂x= 2xT

Multiplication by a (constant) matrix M:∂M.x

∂x= M

A.1.4 Operators and functions particular to R3

In R3, the exterior product does only imply two vectors. It is usually called the cross product:

x× y =

∣∣∣∣∣∣x2.y3 − x3.y2

x3.y1 − x1.y3

x1.y2 − x2.y1

This product is anti-commutative (x × y = −y × x) and is nul if and only if the two vectors arecollinear: x× y = 0⇔ x = λ.y.

Skew-symmetric matrix associated to the cross product: This is the skew-symmetricmatrix such that, for any y we have Sx.y = x× y. This matrix is

Sx =

0 −x3 x2

x3 0 −x1

−x2 x1 0

Jacobian of the cross product: x× y = Sx.y

J×1(x, y) =∂(x× y)

∂x= −Sy J×2(x, y) =

∂(x× y)

∂y= Sx

Page 75: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

A.1. JACOBIANS OF USUAL SCALAR AND VECTOR FUNCTIONS 75

Lagrange identity: ‖x× y‖2 = ‖x‖2‖y‖2 − 〈 x | y 〉2

Gibbs formula: x× (y × z) = 〈 x | z 〉 .y − 〈 x | y 〉 .z

Sx.Sy = y.xT − 〈 x | y 〉 .I3

Jacobi identity: x× (y × z) + y × (z × x) + z × (x× y) = 0

S(x×y) = Sx.Sy − Sy.Sx

Metric property: 〈 x× y | z × t〉 = 〈 x | z 〉 . 〈 y | t〉 − 〈 x | t〉 . 〈 y | z 〉

Jacobian of the normalization :

JN (x) =∂

∂x

(x

‖x‖

)=−S2

x

‖x‖3

Example of Jacobian: (useful for differentiating rotation vector action)

∂(S2r .x)

∂r=∂(r × (r × x))

∂r= −S(r×x) + r.

∂(r × x)

∂r= Sx.Sr − 2Sr.Sx

Page 76: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

76 APPENDIX A. APPENDICES

Page 77: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Appendix B

Solution to exercises

B.1 Exercices on 3D rotation matrices of chapter 3

Answer to Exercise 0.1 Orthonormalization of matrices

• Implement the SVD orthogonalization procedure Mat2RotMat(M) to compute the proper rotationwhich is closest to a given unconstrained matrix M .

function R = Mat2RotMat(M)%% the matrix can be o r t ho gona l i z e d us ing an SVD:% compute the SVD: R = U∗D∗V’ // U, V or thogona l matrices , D d iagona l% i f de t (U∗V’ ) > 0 , s e t Ortho (R) = U∗V’% e l s e s e t Ortho (R) = U∗S∗V’ with S=diag (1 ,1 ,−1) , where the −1 i s% at the l o c a t i o n o f the minimal e i g enva l u e o f D.[U, S ,V] = svd (M) ;R = U∗V’ ;i f det (R) < 0

S (1 , 1 ) = 1 ;S (2 , 2 ) = 1 ;S (3 , 3 ) =−1;R = U∗S∗V’ ;

end

Answer to Exercise 0.2 Rotation matrix to rotation vector

• Implement the operator Sn (SkewSymMat(r)).

function [ SS ] = SkewSymMat( x )SS = [ [ 0 , −x ( 3 ) , x ( 2 ) ] ;

[ x ( 3 ) , 0 , −x ( 1 ) ] ;[−x ( 2 ) , x ( 1 ) , 0 ] ] ;

• Verify that it corresponds to the left cross product.

%% Test SkewSymMaterrmax= 0 ;Nmax = 10 ;for ( i =1:Nmax)

v1 = randn ( 3 , 1 ) ;v2 = randn ( 3 , 1 ) ;

77

Page 78: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

78 APPENDIX B. SOLUTION TO EXERCISES

e r r = cross ( v1 , v2 ) − SkewSymMat( v1 ) ∗ v2 ;i f trace ( err ’ ∗ e r r ) > errmax

errmax= trace ( err ’ ∗ e r r ) ;end

end

errmax

• Implement a function RotVect2RotMat(r) converting a rotation vector to a rotation matrices and itsinverse RotMat2RotVect(r).

function [R]=RotVect2RotMat ( r )

theta2 =0;for ( i = 1 : 3 )

theta2 = theta2 + r ( i )∗ r ( i ) ;endtheta = sqrt ( theta2 ) ;i f ( theta > 1e−5 )

c = cos ( theta ) ;s = sin ( theta ) ;k1 = s / theta ;k2 = (1 − c ) / theta2 ;

else %% Taylor expens ion around t h e t a = 0k2 = 1 . 0 / 2 . 0 − theta2 / 2 4 . 0 ;%% c = 1.0 − t h e t a2 ∗k2 ;k1 = 1 .0 − theta2 /6 ;

end%% Formula one%% R = c∗ eye (3) + k2 ∗ r ’ ∗ r + k1 ∗ SkewSymMat( r ) ;%% Formula 2 :R = eye (3 ) + k1 ∗ SkewSymMat( r ) + k2 ∗ SkewSymMat( r ) ∗ SkewSymMat( r ) ;

function r=RotMat2RotVect (R) ;

%% Ver i fy t ha t \ | R∗Rˆt−Id \ |ˆ2 < 1e−20 wi th \ |M\ |ˆ2 = t r (M.Mˆ t )M= R∗R’− eye ( 3 ) ;i f ( trace (M∗M’ ) > 1e−20)%% the matrix can be o r t ho gona l i z e d us ing an SVD:R = Mat2RotMat(R)

end

%% Compute the ang l e t h e t ac = ( trace (R) − 1 . 0 ) / 2 ; %% cos ( t h e t a )i f ( c > 1 . 0 )

c = 1 .0end %% Accounts f o r p o t e n t i a l numerical i n s t a b i l i t i e si f ( c < −1.0)

c = −1.0 ;end %% Accounts f o r p o t e n t i a l numerical i n s t a b i l i t i e stheta = acos ( c ) ;

i f ( theta < 1e−5) %% R c l o s e to i d e n t i t y : DL around t h e t a = 0f a c t = 0 .5 + theta ∗ theta / 1 2 . 0 ;r (1 ) = ( R(3 , 2 ) − R(2 ,3 ) ) ∗ f a c t ;r (2 ) = ( R(1 , 3 ) − R(3 ,1 ) ) ∗ f a c t ,r (3 ) = ( R(2 , 1 ) − R(1 ,2 ) ) ∗ f a c t ;

Page 79: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

B.1. EXERCICES ON 3D ROTATION MATRICES OF CHAPTER 3 79

elsei f ( abs ( theta− pi ) < 1e−5 )% // R i s c l o s e to a symetry : DL around t h e t a = M PI% // s e t va l u e s wi th d iag terms o f n . ntfor ( i = 1 : 3 )

sq = 1.0+ (R( i , i ) − 1.0)/(1− c ) ;%% sq shou ld be between 0 and 1i f ( sq < 0 . 0 ) sq = 0 . ; endi f ( sq > 1 . ) sq = 1 . ; endr ( i ) = sqrt ( sq ) ;

end%% Normalize r o t a t i on vec to rNorm r = sqrt ( r (1)∗ r (1 ) + r (2)∗ r (2 ) + r (3)∗ r (3 ) ) ;r = r ∗ theta / Norm r ;% // s e t s i gn s wi th o f f−d iag terms o f n . nti f ( (R(1 , 2 ) + R(2 , 1 ) ) < 0 . 0 ) r (2 ) = −r ( 2 ) ; endi f ( (R(1 , 3 ) + R(3 , 1 ) ) < 0 . 0 ) r (3)= −r ( 3 ) ; end

% determine wether r = +/− t h e t a ∗ns i n r (1 ) = ( R(3 , 2 ) − R(2 ,3 ) ) ;s i n r (2 ) = ( R(1 , 3 ) − R(3 ,1 ) ) ;s i n r (3 ) = ( R(2 , 1 ) − R(1 ,2 ) ) ;% determine the most s i g n i f i c a n t termk = 1 ;i f ( abs ( s i n r (2 ) ) > abs ( s i n r ( k ) ) ) k = 2 ; endi f ( abs ( s i n r (3 ) ) > abs ( s i n r ( k ) ) ) k = 3 ; end% choose the s i gni f ( s i n r ( k ) ∗ r ( k ) < 0 .0 ) r = −r ; end

else % At l a s t no numerical problemf a c t = 0 .5 ∗ theta / sin ( theta ) ;r (1 ) = ( R(3 , 2 ) − R(2 ,3 ) ) ∗ f a c t ;r (2 ) = ( R(1 , 3 ) − R(3 ,1 ) ) ∗ f a c t ;r (3 ) = ( R(2 , 1 ) − R(1 ,2 ) ) ∗ f a c t ;

endend

Answer to Exercise 0.3 Random uniform rotations

• Implement a uniform random rotation matrix generator UnifRndRotMat().

function [R] = UnifRndRotMat ( )% we shou ld t h e o r e t i c a l l y v e r i f y t h a t the l a s t two s i n gu l a r va l u e s are% non zero but t h i s event has r e a l l y a n u l l measure .R = Mat2RotMat( randn ( 3 , 3 ) ) ;

• Verify that the distribution of the rotation angle is proportional to sin(θ/2)2.

N=100000;for ( k=1:N)

%% ang le o f a random uniform matrixtheta ( k ) = norm( RotMat2RotVect ( UnifRndRotMat ) ) ;

end%% Compare his togram of t h e t a wi th s in ˆ2( t h e t a /2)[ h , x]=hist ( theta , 1 00 ) ;y = power ( sin ( x / 2 ) , 2 ) ;plot (x , h/N, x , y )

Page 80: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

80 APPENDIX B. SOLUTION TO EXERCISES

f igurer a t i o = (h/N) . / y ;plot (x , r a t i o )

% Improve the s t a b i l i t y o f the d en s i t y e s t ima t i on us ing more samplesNumber = N;for ( l =1:50)

%improve his togramfor ( k=1:N)

%% ang le o f a random uniform matrixtheta ( k ) = norm( RotMat2RotVect ( UnifRndRotMat ) ) ;

endn = hist ( theta , x ) ;h = h + n ;Number = Number+N;

endfigureplot (x , h/Number , x , y )f igurer a t i o = (h/Number) . / y ;plot (x , r a t i o )

• Verify the orthogonality of the SVD orthogonalization and the consistency of the transformationbetween rotation matrices and rotation vectors on large number of random rotations.

Nmax = 1e6 ;

%% Test the o r t h o g ona l i t y o f the SVD or t h o g ona l i z a t i o nerrmax= 0 ;for ( i =1:Nmax)

R = UnifRndRotMat ( ) ;e r r = trace ( (R)∗ (R) ’ − eye (3 ) ) ;i f e r r > errmax

errmax= e r r ;end

enderrmax

%% Test the cons i s t ency o f RotMat −> RotVect and backerrmax= 0 ;for ( i =1:Nmax)

UnifRndRotMat ( ) ;r = RotMat2RotVect (R) ;RR = RotVect2RotMat ( r ) ;e r r = trace ( (R−RR)∗ (R−RR) ’ ) ;i f e r r > errmax

errmax= e r r ;end

enderrmax

Answer to Exercise 0.4 Bases and coordinates in tangent spaces

• Implement functions Ei(i), Ei l(R,i) and Ei r(R,i) that gives the above basis vectors of T IdSO3, andtheir left and right translation which constitute bases of TRSO3 at any rotation R.

Page 81: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

B.1. EXERCICES ON 3D ROTATION MATRICES OF CHAPTER 3 81

function [ Ei ] = Ei ( i )E1 = [ [ 0 , 0 , 0 ] ; [ 0 , 0 , − 1 ] ; [ 0 , 1 , 0 ] ] ;E2 = [ [ 0 , 0 , 1 ] ; [ 0 , 0 , 0 ] ; [ − 1 , 0 , 0 ] ] ;E3 = [ [ 0 , − 1 , 0 ] ; [ 1 , 0 , 0 ] ; [ − 0 , 0 , 0 ] ] ;i f i==1

Ei=E1 ;e l s e i f i==2

Ei=E2 ;e l s e i f i==3

Ei = E3 ;else

error ( ’ index out o f range ’ ) ;end

function e i l i = E i l (R, i )e i l i = R∗Ei ( i ) ;

function e i r i = E i r (R, i )e i r i = Ei ( i )∗R;

• Implement a projection projection TgCoordRotId(dR) : dR ∈ T IdSO3 7→ dr ∈ R3 which computethe coordinates in the above basis. Remark: compute the projection error because dR might notnumerically be exactly skew symmetric.

function [ dr ] = TgCoordRotId (dR)%% Pro jec t i on from T Id SO 3 to Rˆ3 us ing the b a s i s% E1 = [ [ 0 , 0 , 0 ] ; [ 0 , 0 , − 1 ] ; [ 0 , 1 , 0 ] ]% E2 = [ [ 0 , 0 , 1 ] ; [ 0 , 0 , 0 ] ; [ − 1 , 0 , 0 ] ]% E3 = [ [ 0 , − 1 , 0 ] ; [ 1 , 0 , 0 ] ; [ 0 , 0 , 0 ] ]

dr (1 ) = (dR(3 ,2)−dR( 2 , 3 ) ) / 2 ;dr (2 ) = (dR(1 ,3)−dR( 3 , 1 ) ) / 2 ;dr (3 ) = (dR(2 ,1)−dR( 1 , 2 ) ) / 2 ;

% Pro jec t i on errorErr = dR+dR ’ ;e r r = trace ( Err∗Err ’ ) ;i f ( e r r > 1e−12)%% d i s p l a y warning ??e r r

end

• Extend that projection by left and right translations to obtain coordinates in the left and right trans-lated bases: TgCoordRot r(R, dR) and TgCoordRot r(R, dR).

function [ dr ] = TgCoordRot l (R, dR)dr = TgCoordRotId ( R’ ∗ dR ) ;

function [ dr ] = TgCoordRot r (R, dR)dr = TgCoordRotId ( dR ∗ R’ ) ;

• Implement the reverse functions InvTgCoordRotId(dr) : dr ∈ R3 7→ dR ∈ T IdSO3 , InvTgCoor-dRot r(R, dr) and InvTgCoordRot r(R, dr).

Page 82: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

82 APPENDIX B. SOLUTION TO EXERCISES

function [ dR ] = InvTgCoordRotId ( dr )dR = SkewSymMat( dr ) ;

function [ dR ] = InvTgCoordRot l (R, dr )dR = R∗ InvTgCoordRotId ( dr ) ;

function [ dR ] = InvTgCoordRot r (R, dr )dR = InvTgCoordRotId ( dr )∗R;

• Verify that dR = InvTgCoordRot(TgCoordRot(R, dR)) for random tangent vectors at random rotationmatrices.

%% t e s t the l e f t coord ina t e sNmax = 1e5 ;err dr max =0;for ( i =1:Nmax)

% random ro t a t i on matric l o c a t i o nR = UnifRndRotMat ( ) ;% random Gaussian coord ina t e s in T R SO 3dR = randn ( ) ∗ Ei r (R, 1 ) + randn ( )∗ Ei r (R, 2 ) + randn ( )∗ Ei r (R, 3 ) ;dr = TgCoordRot l (R,dR ) ;Err = dR − InvTgCoordRot l (R, dr ) ;errmax = max( sqrt ( trace ( Err∗Err ’ ) ) , errmax ) ;

enderr dr max

%% t e s t the r i g h t coord ina t e serr dr max =0;for ( i =1:Nmax)

% random ro t a t i on matric l o c a t i o nR = UnifRndRotMat ( ) ;% random Gaussian coord ina t e s in T R SO 3dR = randn ( ) ∗ E i l (R, 1 ) + randn ( )∗ E i l (R, 2 ) + randn ( )∗ E i l (R, 3 ) ;dr = TgCoordRot r (R,dR ) ;Err = dR − InvTgCoordRot r (R, dr ) ;errmax = max( sqrt ( trace ( Err∗Err ’ ) ) , errmax ) ;

enderr dr max

Answer to Exercise 0.5 Bi-invariant scalar product in tangent spaces

• Implement the scalar product ScalRotId(X, Y) = Tr(X.Y T)/2.

function s c a l = ScalRotId (X,Y)s c a l =trace (X∗Y’ ) / 2 ;

• Verify that ScalRotId(Ei(i),Ei(j)) = δij

for i = 1 : 3 , for j = 1 : 3 , G( i , j ) = ScalRotId ( Ei ( i ) , Ei ( j ) ) ; end , end , G

• Verify that dR = 〈 dR | E1 〉.E1 + 〈 dR | E2 〉.E2 + 〈 dR | E3 〉.E3 for random skew symmetric matrices.

Page 83: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

B.1. EXERCICES ON 3D ROTATION MATRICES OF CHAPTER 3 83

%% Pseudocode to be v e r i f i e derrmax = 0 ;for ( k=0:100)

%% Random element o f T IdSO 3dR = randn ( ) ∗ Ei (1 ) + randn ( )∗ Ei (2 ) + randn ( )∗ Ei ( 3 ) ;Err = dR − ScalRotId (dR, Ei (1 ) )∗Ei (1 ) − ScalRotId (dR, Ei ( 2 ) )∗ Ei (2 ) − ScalRotId (dR, Ei ( 3 ) )∗ Ei ( 3 ) ;

errmax = max( sqrt ( trace ( Err∗Err ’ ) ) , errmax ) ;enderrmax

• Implement the scalar product ScalRot(R, X, Y) at any rotation R using left translation.

function s c a l = ScalRot (R,X,Y)s c a l = ScalRotId ( R’∗X, R’∗Y) ;

• Generate Gaussian random vectors in the tangent plane at a random rotation R and verify numericallythat their scalr product corresponds to the scalar product of the embedding space.

errmax = 0 ; Nmax=1e5 ;for ( i =1:Nmax)

% s e l e c t random ro t a t i onR = UnifRndRotMat ( ) ;%% Random Gaussian element o f T R SO 3X = randn ( )∗ E i l (R, 1 ) + randn ( )∗ E i l (R, 2 ) + randn ( )∗ E i l (R, 3 ) ;Y = randn ( )∗ E i l (R, 1 ) + randn ( )∗ E i l (R, 2 ) + randn ( )∗ E i l (R, 3 ) ;Err = abs ( trace (X∗Y’ ) / 2 − ScalRot (R,X,Y) ) ;errmax = max( Err , errmax ) ;

enderrmax

Answer to Exercise 0.6 Exp and log maps

• Compute Exp map at Id: ExpRotId(dR)

function [R] = ExpRotId ( dR )dr = TgCoordRotId (dR ) ;%% dr i s the r o t a t i on vec t o r t ha t parameter i ze s%% the tangent v e c t o r dR such t ha t dR= SkewSymMat( dr )R = RotVect2RotMat ( dr ) ;

• Verify that the exponential of any tangent vector at identity corresponds to the matrix exponential.

Nmax=1e5 ;errmax = 0 ;for ( i =1:Nmax)

%% Random Gaussian element o f T Id SO 3X = randn ( )∗ Ei (1 ) + randn ( )∗ Ei (2 ) + randn ( )∗ Ei ( 3 ) ;R1 = expm( X) ; %% using matlab matrix e xponen t i a lR2 = ExpRotId (X) ;Err = R1−R2 ;errmax = max( trace ( Err∗Err ’ ) , errmax ) ;

enderrmax

Page 84: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

84 APPENDIX B. SOLUTION TO EXERCISES

• Compute Log Map at Id LogRotId(dR)

function [ dR ] = LogRotId ( R )%% dr i s the r o t a t i on vec t o rdr = RotMat2RotVect (R) ;dR = SkewSymMat( dr ) ;

• Verify that the log at identity of a random rotation corresponds to the matrix logarithm.

Nmax=1e5 ;errmax = 0 ;for ( i =1:Nmax)

%% Random ro t a t i onR = UnifRndRotMat ( ) ;X1 = logm( R) ; %% using matlab matrix l ogar i thmX2 = LogRotId (R) ;Err = X1−X2 ;errmax = max( trace ( Err∗Err ’ ) , errmax ) ;

enderrmax

• Compute Exp and Log maps at any rotation R using the left invariance

function [RdR] = ExpRotId ( R, dR )RdR = R ∗ ExpRotId (R’∗dR ) ;

function [ dR ] = LogRot ( R, U )dR = R ∗ LogRotId (R’∗U) ;

• Verify that the construction using the right invariance give the same result and the consistency of Exp

Nmax=1e5 ;errmax = 0 ;for ( i =1:Nmax)

% s e l e c t random ro t a t i onR = UnifRndRotMat ( ) ;%% Random Gaussian element o f T R SO 3X = randn ( )∗ E i l (R, 1 ) + randn ( )∗ E i l (R, 2 ) + randn ( )∗ E i l (R, 3 ) ;

Err = ExpRot (R,X) − ExpRotId (X∗ R’ ) ∗ R;errmax = max( trace ( Err∗Err ’ ) , errmax ) ;

enderrmax

errmax = 0 ;for ( i =1:Nmax)

% s e l e c t two random ro t a t i onR = UnifRndRotMat ( ) ;U = UnifRndRotMat ( ) ;Err = LogRot (R,U) − LogRotId (U∗R’ ) ∗ R;errmax = max( trace ( Err∗Err ’ ) , errmax ) ;

enderrmax

• Verify the consistency of Exp(R, Log(R,U)) for random rotations and the one of Log(R, Exp(R, X))for tangent vectors X that are within the cut locus.

Page 85: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

B.1. EXERCICES ON 3D ROTATION MATRICES OF CHAPTER 3 85

Nmax=1e5 ;errmax = 0 ;for ( i =1:Nmax)

% s e l e c t two random ro t a t i o n sR = UnifRndRotMat ( ) ;U = UnifRndRotMat ( ) ;RU = LogRot (R,U) ;Err = U − ExpRot (R, RU) ;errmax = max( trace ( Err∗Err ’ ) , errmax ) ;

enderrmax

errmax = 0 ;for ( i =1:Nmax)

% s e l e c t random ro t a t i onR = UnifRndRotMat ( ) ;%% Random Gaussian element o f T R SO 3X = randn ( )∗ E i l (R, 1 ) + randn ( )∗ E i l (R, 2 ) + randn ( )∗ E i l (R, 3 ) ;

%% i f we reach the cut l o cu s at a d i s t ance pi , throw a new random tangent v e c t o rwhile ( ScalRot (R,X,X) >= pi∗pi ) , X = randn ( )∗ E i l (R, 1 ) + randn ( )∗ E i l (R, 2 ) + randn ( )∗ E i l (R, 3 ) ; , endErr = ExpRot (R,X) − ExpRotId (X∗ R’ ) ∗ R;errmax = max( trace ( Err∗Err ’ ) , errmax ) ;

enderrmax

Answer to Exercise 0.7 Exercises to develop

• Compute the Riemannian metric DistRot(R,U). Compare it to the extrinsic distance in the embeddingspace of matrices.

• Sample a geodesic between two random rotations using an increasing number of points and plot theerror done of the length of the approximated trajectory in the embedding space of matrices. Observethat the Riemannian distance is the upper limit of the length of the approximated trajectory in theembedding space of matrices.

• Visualize the evolution of the frame (R(t).e1, R(t).e2, R(t).e3) along a geodesic between two rotations.Extend the geodesic to a lenth of 4π.

• Play with the Tg vectors at the mid point (Log mid(extremities)). Observe that ~RU = LogRU is

different from − ~UR = LogUR. Compute Xt0 = ~R(t)R(0) and Xt

1~R(t)R(1). Observe that they are

related by (1− t).Xt0 = −t.Xt

1.

Answer to Exercise 0.8 Applications for the future

• Through N random Gaussian rotations at a uniform rotation.

• Compute the mean of a set of N random Gaussian Rotations.

• Compare with the extrinsic mean distance.

• Verify that N times the Mahalanobist distance betwen exact and computed mean is approximately χ23

distributed, i.e. that it has a mean of 3 and a variance of the square of that.

• Random rotation, random unit tangent vector.

• Sample randomly the geodesic starting from R with tangent vector -X/4 to X/4. These will constitutethe elements of the subspace that we want to retrieve.

Page 86: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

86 APPENDIX B. SOLUTION TO EXERCISES

%%% Geodesics

% Throw N random ro t a t i o n s a long the geode s i c between R0 and R1[ t ] = rand (1 ,N)for ( i =1:N)

R( i ) = ExpRot (R0 , t ( i ) ∗ LogRot (R0 , R1) )end

• ”Add” a Gaussian noise of variance σ = 0.2rad to each of these rotations to have noisy observationsof the subspace.

• Compute the mean and the principle modes of these noisy observations. Compare the mean with Rand the first mode with X with respect to the numbre of points N used to sample the subspace.

B.2 Exercices on tensors of chapter 5

Answer to Exercise 0.9 Powers, exponential and logarithm of symmetric matrices

• Implement the function ExpSym(M) that compute the exponential of a symmetric matrix using diag-onalization.

function [ Sigma ] = ExpSym(M)% Exponent ia l o f a symmetric matrix

i f (norm(M−M’ ) > 1e−12 ) , warning ( ’ Input matrix i s not symmetric ’ ) , end ;[V,D] = eig ( (M+M’ ) / 2 ) ;

% ge t the d iagona l e lementsd = diag (D) ;Sigma = V∗diag (exp(d ) )∗V’ ;

• Implement a random SPD matrix generator RndSym() thanks to ExpSym.

function [ Result ] = RndSym( )M = 2∗rand ( [ 3 , 3 ] ) −1 ;Result = ExpSym( M+M’ ) ;

• Implement the function LogSym(M) that compute the logarithm of a SPD matrix using diagonaliza-tion.

function [ Sigma ] = LogSym(M)% Logarithm of a p o s i t i v e d e f i n i t e symmetric matrix

i f (norm(M−M’ ) > 1e−12 ) , warning ( ’ Input matrix i s not symmetric ’ ) , end ;[V,D] = eig ( (M+M’ ) / 2 ) ;% ge t the d iagona l e lementsd = diag (D) ;% Throw an error i f any o f the d iagona l va lue i s nega t i v e .for ( i =1: length (d ) )

i f ( d( i ) <=0 ) , error ( ’ Input matrix i s not SPD ’ ) , endendSigma = V∗diag ( log (d ) )∗V’ ;

• Implement the function SqrtSym(M) that compute the unique SPD square root of a SPD matrix

Page 87: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

B.2. EXERCICES ON TENSORS OF CHAPTER 5 87

function [ Sigma ] = SqrtSym (M)% Square roo t o f an SPD matrix

i f (norm(M−M’ ) > 1e−12 ) , warning ( ’ Input matrix i s not symmetric ’ ) , end ;[V,D] = eig ( (M+M’ ) / 2 ) ;

% ge t the d iagona l e lementsd = diag (D) ;% Throw an error i f any o f the d iagona l va lue i s nega t i v e .for ( i =1: length (d ) )

i f ( d( i ) <=0 ) , error ( ’ Input matrix i s not SPD ’ ) , endSigma = V∗diag ( sqrt (d ) )∗V’ ;

• Implement the function InvSqrtSym(M) that compute the inverse of the SPD square root of a SPDmatrix

function [ Sigma ] = InvSqrtSym (M)% inve r s e o f the square roo t o f a SPD matrix

i f (norm(M−M’ ) > 1e−12 ) , warning ( ’ Input matrix i s not symmetric ’ ) , end ;[V,D] = eig ( (M+M’ ) / 2 ) ;

% ge t the d iagona l e lementsd = diag (D) ;% Throw an error i f any o f the d iagona l va lue i s nega t i v e .for ( i =1: length (d ) )

i f ( d( i ) <=0 ) , error ( ’ Input matrix i s not SPD ’ ) , endSigma = V∗(diag ( 1 . / sqrt (d ) ) )∗V’ ;

• Verify on random symmetric matrices that SqrtSym(M) = ExpSym(0.5*LogSym(M)) and SqrtSym(M)= ExpSym(-0.5*LogSym(M)).

Nmax = 1e5 ;e r r s q r t = 0 ;e r r i n v s q r t = 0 ;for ( k=0:Nmax)

%% Random symmetric matr icesM = RndSym ( ) ;e r r s q r t = max( e r r s q r t , norm( SqrtSym (M) − ExpSym( 0 . 5∗LogSym(M) ) ) ) ;e r r i n v s q r t = max( e r r i n v s q r t , norm( InvSqrtSym (M) − ExpSym(−0.5∗LogSym(M) ) ) ) ;

ende r r s q r te r r i n v s q r t

Page 88: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

88 APPENDIX B. SOLUTION TO EXERCISES

Page 89: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

Bibliography

[ACPA06a] Vincent Arsigny, Olivier Commowick, Xavier Pennec, and Nicholas Ayache. A log-euclidean framework for statistics on diffeomorphisms. In Proc. of the 9th InternationalConference on Medical Image Computing and Computer Assisted Intervention (MIC-CAI’06), Part I, number 4190 in LNCS, pages 924–931, 2-4 October 2006. PMID:17354979.

[ACPA06b] Vincent Arsigny, Olivier Commowick, Xavier Pennec, and Nicholas Ayache. A log-euclidean polyaffine framework for locally rigid or affine registration. In J.P.W. Pluim,B. Likar, and F.A. Gerritsen, editors, Proceedings of the Third International Workshopon Biomedical Image Registration (WBIR’06), volume 4057 of LNCS, pages 120–127,Utrecht, The Netherlands, 9 - 11 July 2006. Springer.

[AF00] J. Ashburner and K. J. Friston. Voxel-based morphometry - the methods. NeuroImage,11(6):805–821, 2000.

[AFPA05] Vincent Arsigny, Pierre Fillard, Xavier Pennec, and Nicholas Ayache. Fast and simplecalculus on tensors in the log-euclidean framework. In J. Duncan and G. Gerig, editors,Proceedings of the 8th Int. Conf. on Medical Image Computing and Computer-AssistedIntervention - MICCAI 2005, Part I, volume 3749 of LNCS, pages 115–122, PalmSprings, CA, USA, October 26-29, 2005. Springer. PMID: 16685836.

[AFPA06] Vincent Arsigny, Pierre Fillard, Xavier Pennec, and Nicholas Ayache. Log-euclideanmetrics for fast and simple calculus on diffusion tensors. Magnetic Resonance inMedicine, 56(2):411–421, August 2006. PMID: 16788917.

[AFPA07] Vincent Arsigny, Pierre Fillard, Xavier Pennec, and Nicholas Ayache. Geometricmeans in a novel vector space structure on symmetric positive-definite matrices. SIAMJournal on Matrix Analysis and Applications, 29(1):328–347, 2007.

[AHB87] K.S. Arun, T.S. Huang, and S.D. Blostein. Least-squares fitting of two 3-D pointsets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(5):698–700,September 1987.

[AK01] G. Aubert and P. Kornprobst. Mathematical problems in image processing - Partialdifferential equations and the calculus of variations, volume 147 of Applied Mathemat-ical Sciences. Springer, 2001.

[AKM+01] A. Andrade, F. Kherif, J.-F Mangin, K. Worsley, A.-L. Paradis, O. Simon, S. Dehaene,and J.-B. Poline. Detection of fMRI activation using cortical surface mapping. HumanBrain Mapping, 12:79–93, 2001.

89

Page 90: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

90 BIBLIOGRAPHY

[Alt86] S. L. Altmann. Rotations, Quaternions, and Double Groups. Clarendon Press - Oxford,1986.

[Ama90] Shun-ichi Amari. Differential-geometric methods in Statistics, volume 28 of LectureNotes in Statistics. Springer, 2nd corr. print edition, 1990.

[APA06] Vincent Arsigny, Xavier Pennec, and Nicholas Ayache. Bi-invariant means in liegroups. application to left-invariant polyaffine transformations. Research report rr-5885, INRIA Sophia-Antipolis, April 2006.

[Arn95] M. Arnaudon. Barycentres convexes et approximations des martingales continues dansles varietes. In M. Yor J. Azema, P.A. Meyer, editor, Seminaire de probabilites XXIX,volume 1613 of Lect. Notes in Math., pages 70–85. Springer-Verlag, 1995.

[ATY05] S. Allassonniere, A. Trouve, and L. Younes. Geodesic shooting and diffeomorphicmatching via textured models. In A. Rangarajan et al., editor, Proc. of EMMCVPR2005, LNCS 3757, page 365 381, 2005.

[Aya89] N. Ayache. Vision Stereoscopique et Perception Multisensorielle. Inter-Editions, 1989.

[Aya91] N. Ayache. Artificial Vision for Mobile robots - Stereo-vision and Multisensor Percep-tion. MIT-Press, 1991. 342 pages.

[Bha03] R. Bhatia. On the exponential metric increasing property. Linear Algebra and itsApplications, 375:211–220, 2003.

[Bie04] G.N.J.C. Bierkens. Geometric methods in diffusion tensor regularization. Master’sthesis, Technishe Universiteit Eindhoven, Dept. of Math and Comp. Sci., 2004.

[Bin74] C. Bingham. An antipodally symmetric distribution on the sphere. Annals of Statistics,2(6):1201–1225, 1974.

[BMA+05] P. Batchelor, M. Moakher, D. Atkinson, F. Calamante, and A. Connelly. A rigorousframework for diffusion tensor calculus. Magnetic Resonance in Medicine, 53:221–225,2005.

[BMB94] P.J. Basser, J. Mattiello, and D. Le Bihan. MR diffusion tensor spectroscopy andimaging. Biophysical Journal, 66:259–267, 1994.

[BMTY05] M.F. Beg, M.I. Miller, A. Trouve, and L. Younes. Computing large deformation metricmappings via geodesic flows of diffeomorphisms. Int. Journal of Computer Vision,61(2):139–157, 2005.

[Boo78] F.L. Bookstein. The Measurement of Biological Shape and Shape Change, volume 24of Lecture Notes in Biomathematics. Springer-Verlag, 1978.

[BP02] R. Bhattacharya and V. Patrangenaru. Nonparametric estimation of location anddispersion on Riemannian manifolds. Journal of Statistical Planning and Inference,108:23–36, 2002.

[BP03] R. Bhattacharya and V. Patrangenaru. Large sample theory of intrinsic and extrinsicsample means on manifolds, I. Annals of Statistics, 31(1):1–29, 2003.

Page 91: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

BIBLIOGRAPHY 91

[BP05] R. Bhattacharya and V. Patrangenaru. Large sample theory of intrinsic and extrinsicsample means on manifolds, II. Annals of Statistics, 33(3):1225–1259, 2005.

[BR82] J. Burbea and C.R. Rao. Entropy differential metric, distance and divergence measuresin probability spaces: a unified approach. Journal of Multivariate Analysis, 12:575–596, 1982.

[Bru07] Anders Brun. Manifolds in Image Science and Visualization. PhD thesis, LinkopingUniversity, 2007. Linkoping Studies in Science and Technology Dissertions No 1157.

[BWBM06] T. Brox, J. Weickert, B. Burgeth, and P. Mrazek. Nonlinear structure tensors. Imageand Vision Computing, 24(1):41–55, 2006.

[CO91] M. Calvo and J.M. Oller. An explicit solution of information geodesic equations forthe multivariate normal model. Statistics and Decisions, 9:119–138, 1991.

[CSM03] D. Cohen-Steiner and J.M. Morvan. Restricted delaunay triangulations and normalcycle. In Proceedings of the nineteenth annual symposium on Computational geometry,pages 312–321, 2003.

[CTDF02] C. Chefd’hotel, D. Tschumperle, R. Deriche, and O. Faugeras. Constrained flows ofmatrix-valued functions: Application to diffusion tensor regularization. In A. Heyden,G. Sparr, M. Nielsen, and P. Johansen, editors, Proc. of ECCV 2002, volume 2350 ofLNCS, pages 251–265. Springer Verlag., 2002.

[CTDF04] C. Chefd’hotel, D. Tschumperle, R. Deriche, and O. Faugeras. Regularizing flowsfor constrained matrix-valued images. J. Math. Imaging and Vision, 20(1-2):147–162,January - March 2004.

[CZK+98] D.L. Collins, A.P. Zijdenbos, V. Kollokian, J.G. Sled, N.J. Kabani, C.J. Holmes, andA.C. Evans. Design and construction of a realistic digital brain phantom. IEEETransactions on Medical Imaging, 17(3):463–468, June 1998.

[Dar96] R.W.R. Darling. Martingales on non-compact manifolds: maximal inequalitiesand prescribed limits. Annales de l’institut Poincare - Probabilites et Statistiques,32(4):431–454, 1996.

[dC92] M. do Carmo. Riemannian Geometry. Mathematics. Birkhauser, Boston, Basel, Berlin,1992.

[DE97] R.B. Fisher D.W. Eggert, A. Lorusso. Estimating 3d rigid body transformations: Acomparison of four major algorithms. Machine Vision Applications, Special Issue onPerformance Characterisitics of Vision Algorithms, 9(5/6):272–290, 1997.

[DFBJ07] B. Davis, P.Th. Fletcher, E. Bullitt, and S. Joshi. Population shape regression fromrandom design data. In Proc. of ICCV’07, 2007.

[Dia88] D. Diamond. A note on the rotational superposition problem. Acta Cryst., A44:211–216, 1988.

[DKZ08] I.L. Dryden, A. Koloydenko, and D. Zhou. Non-euclidean statistics for covariancematrices with application to diffusion tensor imaging. Submitted, 2008.

Page 92: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

92 BIBLIOGRAPHY

[DM91] I.L. Dryden and K.V. Mardia. Theoretical and distributional aspects of shape analysis.In Probability Measures on Groups, X (Oberwolfach, 1990), pages 95–116, New York,1991. Plenum.

[DM98] I.L. Dryden and K.V. Mardia. Statistical Shape Analysis. John Wiley, Chichester,1998.

[DMP03] J.-P. Dedieu, G. Malajovich, and P. Priouret. Newton method on Riemannian mani-folds: Covariant alpha-theory. IMA Journal of Numerical Analysis, 23:395–419, 2003.

[DPT+08] Stanley Durrleman, Xavier Pennec, Alain Trouve, Paul Thompson, and Nicholas Ay-ache. Inferring brain variability from diffeomorphic deformations of currents: an inte-grative approach. Medical Image Analysis, 12(5):626–637, 2008. PMID: 18658005.

[DPTA08a] Stanley Durrleman, Xavier Pennec, Alain Trouve, and Nicholas Ayache. A forwardmodel to build unbiased atlases from curves and surfaces. In X. Pennec and S. Joshi,editors, Proc. of the International Workshop on the Mathematical Foundations of Com-putational Anatomy (MFCA-2008), September 2008.

[DPTA08b] Stanley Durrleman, Xavier Pennec, Alain Trouve, and Nicholas Ayache. Sparse ap-proximation of currents for statistics on curves and surfaces. In Dimitris Metaxas,Leon Axel, Gabor Szekely, and Gabor Fichtinger, editors, Proc. Medical Image Com-puting and Computer Assisted Intervention (MICCAI), Part II, volume 5242 of LNCS,pages 390–398, New-York, USA, September 2008. Springer. PMID: 18982629.

[DS87] Persi Diaconis and Mehrdad Shahshahani. The subgroup algorithm for generatinguniform random variables. Prob. In Eng. And Info. Sci., 1:15–32, 1987.

[DT17] W. D’Arcy Thompson. On Growth and Form. Cambridge University Press, England.,1917.

[EAS98] A. Edelman, T. Arias, and S.T. Smith. The geometry of algorithms with orthogonalityconstraints. SIAM Journal of Matrix Analysis and Applications, 20(2):303–353, 1998.

[ECM+93] A. C. Evans, D. L. Collins, S. R. Mills, E. D. Brown, R. L. Kelly, and T. M. Peters.3D statistical neuroanatomical models from 305 MRI volumes. In Proc. IEEE-NuclearScience Symposium and Medical Imaging Conference, pages 1813–1817, 1993.

[EM91] M. Emery and G. Mokobodzki. Sur le barycentre d’une probabilite dans une variete.In M. Yor J. Azema, P.A. Meyer, editor, Seminaire de probabilites XXV, volume 1485of Lect. Notes in Math., pages 220–233. Springer-Verlag, 1991.

[Eme89] M. Emery. Stochastic Calculus in Manifolds. Springer, Berlin, 1989.

[FAAP05] Pierre Fillard, Vincent Arsigny, Nicholas Ayache, and Xavier Pennec. A riemannianframework for the processing of tensor-valued images. In Ole Fogh Olsen, Luc Flo-rak, and Arjan Kuijper, editors, Deep Structure, Singularities, and Computer Vision(DSSCV), LNCS, pages 112–123. Springer, June 2005.

[FAP+05] Pierre Fillard, Vincent Arsigny, Xavier Pennec, Paul M. Thompson, and Nicholas Ay-ache. Extrapolation of sparse tensor fields: Application to the modeling of brain vari-ability. In Gary Christensen and Milan Sonka, editors, Proc. of Information Processing

Page 93: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

BIBLIOGRAPHY 93

in Medical Imaging 2005 (IPMI’05), volume 3565 of LNCS, pages 27–38, Glenwoodsprings, Colorado, USA, July 2005. Springer. PMID: 17354682.

[FAP+07] Pierre Fillard, Vincent Arsigny, Xavier Pennec, Kiralee M. Hayashi, Paul M. Thomp-son, and Nicholas Ayache. Measuring brain variability by extrapolating sparse tensorfields measured on sulcal lines. Neuroimage, 34(2):639–650, January 2007. Also asINRIA Research Report 5887, April 2006. PMID: 17113311.

[FAPA06] Pierre Fillard, Vincent Arsigny, Xavier Pennec, and Nicholas Ayache. Clinical DT-MRIestimation, smoothing and fiber tracking with log-euclidean metrics. In Proceedings ofthe IEEE International Symposium on Biomedical Imaging (ISBI 2006), pages 786–789, Crystal Gateway Marriott, Arlington, Virginia, USA, April 2006.

[Fau93] O. Faugeras. Three-Dimensional Computer Vision: A Geometric Viewpoint. MITpress, 1993.

[FH84] J. Fang and T.S. Huang. Some experiments on estimating the 3-D motion parametersof a rigid body from two consecutive image frames. IEEE Transactions on PatternAnalysis and Machine Intelligence, 6(5):545–554, September 1984.

[FJ04] P. Thomas Fletcher and Sarang C. Joshi. Principal geodesic analysis on symmetricspaces: Statistics of diffusion tensors. In Computer Vision and Mathematical Methodsin Medical and Biomedical Image Analysis, ECCV 2004 Workshops CVAMIA andMMBIA, Prague, Czech Republic, May 15, 2004, volume 3117 of LNCS, pages 87–98.Springer, 2004.

[FJLP03] P.T. Fletcher, S. Joshi, C. Lu, and S Pizer. Gaussian distributions on Lie groupsand their application to statistical shape analysis. In Chris Taylor and Alison Noble,editors, Proc. of Information Processing in Medical Imaging (IPMI’2003), volume 2732of LNCS, pages 450–462. Springer, 2003.

[FL98] M. Fleute and S. Lavallee. Building a complete surface model from sparse datausing statistical shape models: Application to computer assisted knee surgery. InSpringer, editor, Proc. of Medical Image Computing and Computer-Assisted Interven-tation (MICCAI’98), volume 1496 of LNCS, pages 879–887, 1998.

[FM99] W. Forstner and B. Moonen. A metric for covariance matrices. In F. Krumm andV. S. Schwarze, editors, Qua vadis geodesia...? Festschrift for Erik W. Grafarend onthe occasion of his 60th birthday, number 1999.6 in Tech. Report of the Dpt of Geodesyand Geoinformatics, pages 113–128. Stuttgart University, 1999.

[FOF+05] D. Facon, A. Ozanne, P. Fillard, J.-F. Lepeintre, C. Tournoux-Facon, and D. Ducreux.Mr diffusion tensor imaging and fiber tracking in spinal cord compression. AmericanJournal of Neuroradiology (AJNR), 26:1587–1594, 2005.

[FPAA07] Pierre Fillard, Xavier Pennec, Vincent Arsigny, and Nicholas Ayache. Clinical DT-MRIestimation, smoothing and fiber tracking with log-euclidean metrics. IEEE Transac-tions on Medical Imaging, 26(11):1472–1482, November 2007. PMID: 18041263.

[FPTA08] Pierre Fillard, Xavier Pennec, Paul M. Thompson, and Nicholas Ayache. Evaluatingbrain anatomical correlations via canonical correlation analysis of sulcal lines. Neu-roImage, 2008. Accepted for publication.

Page 94: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

94 BIBLIOGRAPHY

[Fre44] M. Frechet. L’integrale abstraite d’une fonction abstraite d’une variable abstraite etson application a la moyenne d’un element aleatoire de nature quelconque. RevueScientifique, pages 483–512, 1944.

[Fre48] M. Frechet. Les elements aleatoires de nature quelconque dans un espace distancie.Annales de l’Institut Henri Poincare, 10:215–310, 1948.

[Gam91] R.V. Gamkrelidze, editor. Geometry I, volume 28 of Encyclopaedia of MathematicalSciences. Springer Verlag, 1991.

[GHL93] S. Gallot, D. Hulin, and J. Lafontaine. Riemannian Geometry. Springer Verlag, 2ndedition edition, 1993.

[GKKJ92] G. Gerig, R. Kikinis, O. Kubler, and F.A. Jolesz. Nonlinear anisotropic filtering ofMRI data. IEEE Transactions on Medical Imaging, 11(2):221–232, June 1992.

[GMS98] U. Grenander, M.I. Miller, and A. Srivastava. Hilbert-schmidt lower bounds for es-timators on matrix Lie groups for ATR. IEEE Transations on Pattern Analysis andMachine Intelligence (PAMI), 20(8):790–802, 1998.

[GP02] Sebastien Granger and Xavier Pennec. Statistiques exactes et approchees sur lesnormales aleatoires. Research report RR-4533, INRIA, 2002.

[Gra01] C. Gramkow. On averaging rotations. International Journal of Computer Vision,42(1-2):7–16, April/May 2001.

[Gre63] U. Grenander. Probabilities on Algebraic Structures. Whiley, 1963.

[Gre93] U. Grenander. General Pattern Theory: A Mathematical Study of Regular Structures.Oxford University Press Inc., New York, NY., 1993.

[Gri06] Alexander Grigor’yan. Heat kernels on weighted manifolds and applications. In J. Jor-genson and L. Walling, editors, The Ubiquitous Heat Kernel, volume 398 of Contem-porary Mathematics, pages 91–190. AMS, 2006.

[Hel78] S. Helgason. Differential Geometry, Lie groups, and Symmetric Spaces. AcademicPress, 1978.

[Hen79] W. Hendrickson. Transformations to optimize the superposition of similar structures.Acta Cryst., A35:158–163, 1979.

[Hen91] H. Hendricks. A Cramer-Rao type lower bound for estimators with values in a mani-fold. Journal of Multivariate Analysis, 38:245–261, 1991.

[HHN88] B.K.P. Horn, Hilden H.M., and S. Negahdaripour. Closed form solutions of absoluteorientation using orthonormal matrices. Journal of Optical Society of America, A-5(7):1127–1135, 1988.

[HLW02] Ernst Hairer, Ch. Lubich, and Gerhard Wanner. Geometric numerical integration: structure preserving algorithm for ordinary differential equations, volume 31 ofSpringer series in computational mathematics. Springer, 2002.

Page 95: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

BIBLIOGRAPHY 95

[HM93] R. Horaud and O. Monga. Vision par ordinateur : outils fondamentaux. Hermes,1993.

[HM94] Uwe Helmke and J. B. Moore. Optimization and Dynamical Systems. Communicationand Control Engineering Series. Springer, 1994.

[Hor87] B.K.P. Horn. Closed form solutions of absolute orientation using unit quaternions.Journal of Optical Society of America, A-4(4):629–642, April 1987.

[iAN00] Shun ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry. Number191 in Translations of Mathematical Monographs. American Mathematical Society,2000.

[JKSJ07a] Shantanu H. Joshi, Eric Klassen, Anuj Srivastava, and Ian Jermyn. A novel represen-tation for riemannian analysis of elastic curves in rn. Computer Vision and PatternRecognition, IEEE Computer Society Conference on, 0:1–7, 2007.

[JKSJ07b] Shantanu H. Joshi, Eric Klassen, Anuj Srivastava, and Ian Jermyn. Removing shape-preserving transformations in square-root elastic (sre) framework for shape analysisof curves. In EMMCVPR’07: Proceedings of the 6th international conference on En-ergy minimization methods in computer vision and pattern recognition, pages 387–398,Berlin, Heidelberg, 2007. Springer-Verlag.

[JKSM06] Shantanu Joshi, David Kaziska, Anuj Srivastava, and Washington Mio. Riemannianstructures on shape spaces: A framework for statistical inferences. In Hamid Krim andAnthony Yezzi, editors, Statistics and Analysis of Shapes, Modeling and Simulation inScience, Engineering and Technology, pages 313–333. Birkhuser Boston, 2006.

[JM89] P.E. Jupp and K.V. Mardia. A unified view of the theory of directional statistics,1975-1988. International Statistical Review, 57(3):261–294, 1989.

[JM00] Sarang C. Joshi and Michael I. Miller. Landmark matching via large deformationdiffeomorphisms. IEEE Trans. Image Processing, 9(8):1357–1370, 2000.

[Joh66] R.M. Johnson. The minimal transformation to orthonormality. Psychometrika, 31:61–66, 1966.

[Kab76] W Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Cryst.,A32:922–923, 1976.

[Kab78] W. Kabsch. A discussion of the solution for the best rotation to relate two sets ofvectors. Acta Cryst., A34:827–828, 1978.

[Kan90] K. Kanatani. Group-Theoretical Methods in Image Understanding. Number 20 inSpringer Series in Information Science. Springer Verlag, 1990.

[Kan93] K. Kanatani. Geometric Computation for Machine Vision. Number 37 in The OxfordEngineering science series. Clarendon Press - Oxford, 1993.

[Kar77] H. Karcher. Riemannian center of mass and mollifier smoothing. Communications inPure and Applied Mathematics, 30:509–541, 1977.

Page 96: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

96 BIBLIOGRAPHY

[Kea89] S.K. Kearsley. On the orthogonal transformation used for structural comparison. ActaCryst., A45:208–210, 1989.

[Ken89] D.G. Kendall. A survey of the statistical theory of shape (with discussion). StatisticalScience, 4:87–120, 1989.

[Ken90] W.S. Kendall. Probability, convexity, and harmonic maps with small image I: unique-ness and fine existence. Proc. London Math. Soc., 61(2):371–406, 1990.

[Ken91] W.S. Kendall. Convexity and the hemisphere. Journal of the London MathematicalSociety, 43(2):567–576, 1991.

[Ken92] J.T. Kent. The art of Statistical Science, chapter 10 : New Directions in ShapeAnalysis, pages 115–127. John Wiley & Sons, 1992. K.V. Mardia, ed.

[Kli82] W. Klingenberg. Riemannian Geometry. Walter de Gruyter, Berlin, New York, 1982.

[KM63] M.G. Kendall and P.A.P. Moran. Geometrical probability. Number 10 in Griffin’sstatistical monographs and courses. Charles Griffin & Co. Ltd., 1963.

[KN69] S. Kobayashi and K. Nomizu. Foundations of differential geometry, vol. II. Number 15in Interscience tracts in pure and applied mathematics. John Whiley & Sons, 1969.

[KN97] C. Y. Kaya and J. L. Noakes. Geodesics and an optimal control algorithm. In Proceed-ings of the 36th IEEE Conference on Decision and Control, San Diego, CA, U.S.A,pages 4918–4919, 1997.

[KSMJ04] E. Klassen, A. Srivastava, W. Mio, and S. Joshi. Analysis of planar shapes usinggeodesic path on shape spaces. IEEE Trans. on PAMI, 26(3):372–383, 2004.

[KV97] Robert E. Kass and Paul W Vos. Geometric Foundations of Asymptotic Inference.Wiley series in Probability and Statistics. John Wiley & Sons, 1997.

[Lat91] J.C. Latombe. Robot motion planning, chapter Chap. 2: Configuration Space of aRigid Object, pages 58–89. Kluwer Academic Publishers Groups, 1991.

[LBC+08] Natasha Lepore, Caroline Brun, Yi-Yu Chou, Agatha D. Lee, Marina Barysheva,Xavier Pennec, Katie McMahon, Matthew Meredith, Greig I. de Zubicaray, Mar-garet J. Wright, Arthur W. Toga, and Paul M. Thompson. Best individual templateselection from deformation tensor minimization. In Proc. of the 2008 IEEE Int. Symp.on Biomedical Imaging: From Nano to Macro (ISBI’08), Paris, France, May 14-17,pages 460–463, 2008.

[LBMP+01] D. Le Bihan, J.-F. Mangin, C. Poupon, C.A. Clark, S. Pappata, N. Molko, andH. Chabriat. Diffusion tensor imaging: Concepts and applications. Journal MagneticResonance Imaging, 13(4):534–546, 2001.

[LGPC+99] G. Le Goualher, E. Procyk, D. Collins, R. Venugopal, C. Barillot, and A. Evans. Auto-mated extraction and variability analysis of sulcal neuroanatomy. IEEE Transactionson Medical Imaging, 18(3):206–217, 1999.

[LK93] H. Le and D.G. Kendall. The Riemannian structure of Euclidean shape space: a novelenvironment for statistics. Annals of Statistics, 21:1225–1271, 1993.

Page 97: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

BIBLIOGRAPHY 97

[LMO00] Miroslav Lovric and Maung Min-Oo. Multivariate normal distributions parametrizedas a Riemannian symmetric space. Journal of Multivariate Analysis, 74(1):36–48,2000.

[LRDF06] Ch. Lenglet, M. Rousson, R. Deriche, and O. Faugeras. Statistics on the manifoldof multivariate normal distributions: Theory and application to diffusion tensor MRIprocessing. Journal of Mathematical Imaging and Vision, 25(3):423–444, October2006.

[Mac84] A.L. Mackay. Quaternion transformation of molecular orientation. Acta Cryst.,A40:165–166, 1984.

[Mar99] K.V. Mardia. Directional statistics and shape analysis. Journal of applied Statistics,26(949-957), 1999.

[MBG97] B. Mohammadi, H. Borouchaki, and P.L. George. Delaunay mesh generation governedby metric specifications. Part II: applications. Finite Elements in Analysis and Design,pages 85–109, 1997.

[McL72] A.D. McLachlan. A mathematical procedure for superimposing atomic coordinates ofproteins. Acta Cryst., A28:656–657, 1972.

[Mei02] E. Meijering. A chronology of interpolation: From ancient astronomy to modern signaland image processing. Proceedings of the IEEE, 90(3):319–342, March 2002.

[MJ00] K.V. Mardia and P.E. Jupp. Directional statistics. Whiley, Chichester, 2000.

[MLT00] G Medioni, M.-S. Lee, and Ch.-K. Tang. A Computational Framework for Segmenta-tion and Grouping. Elsevier, 2000.

[MM02] R. Mahony and R. Manton. The geometry of the Newton method on non-compact Liegroups. Journal of Global Optimization, 23:309–327, 2002.

[MM06] Peter W. Michor and David Mumford. Riemannian geometries on spaces ofplane curves. J. Eur. Math. Soc. (JEMS), 8:1–48, 2006. ESI Preprint 1425,arXiv:math.DG/0312384.

[Moa02] M. Moakher. Means and averaging in the group of rotations. SIAM Journal of MatrixAnalysis and Applications, 24(1):1–16, 2002.

[Moa05] M. Moakher. A differential geometric approach to the geometric mean of symmet-ric positive-definite matrices. SIAM Journal of Matrix Analysis and Applications,26(3):735–747, 2005.

[Mos39] G.I. Mosier. Determining a simple structure when loadings for certain tests are known.Psychometrika, pages 149–162, 1639. First solution to rigid LSQ.

[MRC+04a] J.-F. Mangin, D. Riviere, A. Cachia, E. Duchesnay, Y. Cointepas, D. Papadopoulos-Orfanos, D. L. Collins, A. C. Evans, and J. Regis. Object-based morphometry of thecerebral cortex. IEEE Transactions on Medical Imaging, 23(8):968–982, August 2004.

Page 98: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

98 BIBLIOGRAPHY

[MRC+04b] J.-F. Mangin, D. Riviere, A. Cachia, E. Duchesnay, Y. Cointepas, D. Papadopoulos-Orfanos, P. Scifo, T. Ochiai, F. Brunelle, and J. Regis. A framework to study thecortical folding patterns. NeuroImage, 23(Supplement 1):S129–S138, 2004.

[MSH05] Bill Moran, Sofia Suvorova, and Stephen Howard. Sensor management for radar: atutorial. In Advances in Sensing with Security Applications 17–30 July 2005, Il Ciocco,Italy. NATO Advanced Study Institute, 2005.

[MSJ07] Washington Mio, Anuj Srivastava, and Shantanu Joshi. On shape of plane elasticcurves. Int. J. Comput. Vision, 73(3):307–324, 2007.

[MTE+01] J. Mazziotta, A. Toga, A. Evans, P. Fox, J. Lancaster, K. Zilles, R. Woods, T Paus,G. Simpson, B. Pike, C. Holmes, L. Collins, P. Thompson, D. MacDonald, M. Ia-coboni, T. Schormann, K. Amunts, N. Palomero-Gallagher, S. Geyer, L. Parsons,K. Narr, N. Kabani, G. Le Goualher, D. Boomsma, T. Cannon, R. Kawashima, andB. Mazoyer. A probabilistic atlas and reference system for the human brain: Interna-tional consortium for brain mapping (ICBM). Philos Trans R Soc Lond B Biol Sci,356:1293–1322, 2001.

[MTY03] M.I. Miller, A. Trouve, and L. Younes. On the metrics and Euler-Lagrange equationsof computational anatomy. Annual Re-view of Biomedical Engineering, pages 375–405,2003.

[MTY06] Michael I. Miller, Alain Trouve, and Laurent Younes. Geodesic shooting for compu-tational anatomy. Journal of Mathematical Imaging and Vision, 2006.

[MY01] M.I. Miller and L. Younes. Group actions, homeomorphisms, and matching: A generalframework. International Journal of Computer Vision, 41(1/2):61–84, 2001.

[Nom54] K. Nomizu. Invariant affine connections on homogeneous spaces. American J. ofMath., 76:33–65, 1954.

[OC95] J.M. Oller and J.M. Corcuera. Intrinsic analysis of statistical estimation. Annals ofStatistics, 23(5):1562–1581, 1995.

[OW00] B. Owren and B. Welfert. The Newton iteration on Lie groups. BIT NumericalMathematics, 40(1):121–145, 2000.

[PA98] Xavier Pennec and Nicholas Ayache. Uniform distribution, distance and expectationproblems for geometric features processing. Journal of Mathematical Imaging andVision, 9(1):49–67, July 1998. A preliminary version appeared as INRIA ResearchReport 2820, March 1996.

[Pau82] R.P. Paul. Robot Manipulators. MIT Press, 1982.

[Pen96] Xavier Pennec. L’incertitude dans les problemes de reconnaissance et de recalage –Applications en imagerie medicale et biologie moleculaire. These de sciences (phdthesis), Ecole Polytechnique, Palaiseau (France), December 1996.

[Pen98] Xavier Pennec. Computing the mean of geometric features - application to the meanrotation. Research Report RR-3371, INRIA, March 1998.

Page 99: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

BIBLIOGRAPHY 99

[Pen99] Xavier Pennec. Probabilities and statistics on riemannian manifolds: Basic tools forgeometric measurements. In A.E. Cetin, L. Akarun, A. Ertuzun, M.N. Gurcan, andY. Yardimci, editors, Proc. of Nonlinear Signal and Image Processing (NSIP’99), vol-ume 1, pages 194–198, June 20-23, Antalya, Turkey, 1999. IEEE-EURASIP.

[Pen04] Xavier Pennec. Probabilities and statistics on riemannian manifolds: A geometricapproach. Research Report 5093, INRIA, January 2004. An extended version willappear in the Int. Journal of Mathematical Imaging and Vision.

[Pen06a] Xavier Pennec. Intrinsic statistics on riemannian manifolds: Basic tools for geometricmeasurements. Journal of Mathematical Imaging and Vision, 25(1):127–154, July2006. A preliminary appeared as INRIA RR-5093, January 2004.

[Pen06b] Xavier Pennec. Left-invariant riemannian elasticity: a distance on shape diffeomor-phisms? In X. Pennec and S. Joshi, editors, Proc. of the International Workshop onthe Mathematical Foundations of Computational Anatomy (MFCA-2006), pages 1–13,2006.

[Pen06c] Xavier Pennec. Statistical Computing on Manifolds for Computational Anatomy. Ha-bilitation a diriger des recherches, Universite Nice Sophia-Antipolis, December 2006.

[PFA04] Xavier Pennec, Pierre Fillard, and Nicholas Ayache. A riemannian framework fortensor computing. Research Report 5255, INRIA, July 2004. Published in Int. Journalof Computer Vision 65(1), October 2005.

[PFA06] Xavier Pennec, Pierre Fillard, and Nicholas Ayache. A riemannian framework fortensor computing. International Journal of Computer Vision, 66(1):41–66, January2006. A preliminary version appeared as INRIA Research Report 5255, July 2004.

[PGT98] Xavier Pennec, Charles R.G. Guttmann, and Jean-Philippe Thirion. Feature-basedregistration of medical images: Estimation and validation of the pose accuracy. In Proc.of First Int. Conf. on Medical Image Computing and Computer-Assisted Intervention(MICCAI’98), volume 1496 of LNCS, pages 1107–1114, Cambridge, USA, October1998. Springer.

[Pic94] J. Picard. Barycentres et martingales sur une variete. Annales de l’institut Poincare- Probabilites et Statistiques, 30(4):647–702, 1994.

[PM90] P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion.IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), 12(7):629–639, 1990.

[PMB06] J. M. Provenzale, S. Mukundan, and D. P. Barboriak. Diffusion-weighted and perfusionMR imaging for brain tumor characterization and assessment of treatment response.Radiology, 239(3):632 – 649, 2006.

[Poi12] H. Poincare. Calcul des probabilites. 2nd edition, Paris, 1912.

[PSA+05] Xavier Pennec, Radu Stefanescu, Vincent Arsigny, Pierre Fillard, and Nicholas Ay-ache. Riemannian elasticity: A statistical regularization framework for non-linearregistration. In J. Duncan and G. Gerig, editors, Proceedings of the 8th Int. Conf. onMedical Image Computing and Computer-Assisted Intervention - MICCAI 2005, Part

Page 100: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

100 BIBLIOGRAPHY

II, volume 3750 of LNCS, pages 943–950, Palm Springs, CA, USA, October 26-29,2005. Springer. PMID: 16686051.

[PT97] Xavier Pennec and Jean-Philippe Thirion. A framework for uncertainty and validationof 3d registration methods based on points and frames. Int. Journal of ComputerVision, 25(3):203–229, December 1997.

[RGB+05] M. Rovaris, A. Gass, R. Bammer, S.J. Hickman, O. Ciccarelli, D.H. Miller, and M. Fil-ippi. Diffusion mri in multiple sclerosis. Neurology, 65:1526–1532, 2005.

[RJS04] K.T. Rajamani, S.C. Joshi, and M.A Styner. Bone model morphing for enhancedsurgical visualization. In IEEE, editor, Proc of IEEE Symp. on Biomedical Imaging:Nano to Macro (ISBI) 2004, volume 2, pages 1255–1258, April 2004.

[RL87] P.J. Rousseeuw and A.M. Leroy. Robust Regression and Outliers Detection. Wileyseries in prob. and math. stat. J. Wiley and Sons, 1987.

[Sch66] P.H. Schonemann. A generalized solution of the orthogonal Procrustes problem. Psy-chometrika, 31:1–10, 1966.

[SdDSD98] J. Sijbers, A.J. den Dekker, P. Scheunders, and D. Van Dyck. Maximum likelihoodestimation of Rician distribution parameters. TMI, 17(3), June 1998.

[Sko84] L.T. Skovgaard. A Riemannian geometry of the multivariate normal model. Scand.J. Statistics, 11:211–223, 1984.

[Sma96] C.G. Small. The Statistical Theory of Shapes. Springer series in statistics. Springer,1996.

[Spi79] M. Spivak. Differential Geometry, volume 1. Publish or Perish, Inc., 2nd edition, 1979.

[STA98] G. Subsol, J.-Ph. Thirion, and N. Ayache. A scheme for automatically building 3dmorphometric anatomical atlases: application to a skull atlas. Medical Image Analysis,2(1):37–60, 1998.

[Ste80] G. Stewart. The efficient generation of random orthogonal matrices with an applicationto condition estimators. SIAM J. Numer. Anal., 17(3):403–409, 1980.

[Stu64] J. Stuelpnagel. On the parametrization of the three dimensional rotation group. SIAMReview, 6(4):422–430, 1964.

[TBU00] P. Thevenaz, T. Blu, and M. Unser. Interpolation revisited. IEEE Transactions onMedical Imaging, 19(7):739–758, July 2000.

[TMM+77] P.M. Thompson, D. MacDonald, M.S. Mega, C.J. Holmes, A.C. Evans, and A.W.Toga. Detection and mapping of abnormal brain structure with a probabilistic atlasof cortical surfaces. Journal of Computer Assisted Tomography, 21(4):567–581, 1977.

[TMW+01] P.M. Thompson, M.S. Mega, R.P. Woods, C.I. Zoumalan, C.J. Lindshield, R.E. Blan-ton, J. Moussai, C.J. Holmes, J.L. Cummings, and A.W. Toga. Cortical change inalzheimer’s disease detected with a disease-specific population-based brain atlas. Cere-bral Cortex, 11(1):1–16, January 2001.

Page 101: €¦ · Contents 1 Introduction: Why do we need to compute on manifolds? 7 1.1 Computational anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2

BIBLIOGRAPHY 101

[Tro98] Alain Trouve. Diffeomorphisms groups and pattern matching in image analysis. In-ternational Journal of Computer Vision, 28(3):213–221, 1998.

[Tsc02] D. Tschumperle. PDE-Based Regularization of Multivalued Images and Applications.PhD thesis, University of Nice-Sophia Antipolis, dec 2002.

[TT88] J. Talairach and P. Tournoux. Co-Planar Stereotaxic Atlas of the Human Brain: 3-dimensional Proportional System : an Approach to Cerebral Imaging. Thieme MedicalPublishers, New York, 1988.

[Ume91] S. Umeyama. Least-squares estimation of transformation parameters between twopoint patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence,13(4):376–380, April 1991.

[VG05] M. Vaillant and J. Glaunes. Surface matching via currents. In Proc. of IPMI’05, pages381–392, 2005.

[VMYT04] M. Vaillant, M.I. Miller, L. Younes, and A. Trouve. Statistics on diffeomorphisms viatangent space representations. NeuroImage, 23(Supp. 1):S161–S169, 2004.

[VQGM07] M. Vaillant, A. Qiu, J. Glaunes, and M. Miller. Diffeomorphic metric surface mappingin subregion of the superior temporal gyrus. NeuroImage, 34(3):1149–1159, 2007.

[WB02] J. Weickert and T. Brox. Diffusion and regularization of vector- and matrix-valuedimages. In M.Z. Nashed and O. Scherzer, editors, Inverse Problems, Image Analy-sis, and Medical Imaging., volume 313 of Contemporary Mathematics, pages 251–268,Providence, 2002. AMS.

[WH06] J. Weickert and H. Hagen, editors. Visualization and Processing of Tensor Fields.Mathematics and Visualization. Springer, 2006.

[WMM+02] C.F. Westin, S.E. Maier, H. Mamata, A. Nabavi, F.A. Jolesz, and R. Kikinis. Process-ing and visualization for diffusion tensor MRI. Medical Image Analysis, 6(2):93–108,June 2002.

[WS91] M.W. Walker and L. Shao. Estimating 3-D location paramters using dual numberquaternions. CVGIP: Image Understanding, 54(3):358–367, Nov. 1991.

[WV05] Z. Wang and B.C. Vemuri. DTI segmentation using an information theoretic tensordissimilarity measure. IEEE Trans. on Medical Imaging, 24(10):1267–1277, October2005.

[WVCM04] Z. Wang, B.C. Vemuri, Y. Chen, and T. Mareci. A constrained variational principlefor direct estimation and smoothing of the diffusion tensor field from complex DWI.IEEE Trans. on Medical Imaging, 2004.

[You98] Laurent Younes. Computable elastic distances between shapes. SIAM Journal onApplied Mathematics, 58(2):565–586, 1998.