author's personal copy -...

20
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Upload: others

Post on 30-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Page 2: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

On advances in differential-geometric approaches for 2D and 3D shape analyses andactivity recognition☆

Anuj Srivastava a,⁎, Pavan Turaga b, Sebastian Kurtek a

a Department of Statistics, Florida State University, Tallahassee, FL, United Statesb Schools of Arts, Media, Engineering and Electrical Engineering, Arizona State University, Tempe, AZ, United States

a b s t r a c ta r t i c l e i n f o

Keywords:Analytic manifoldsRiemannian shape metricsElastic shape analysisVideo analysisActivity recognitionStatic and video image data

In this paper we summarize recent advances in shape analysis and shape-based activity recognition problemswith a focus on techniques that use tools from differential geometry and statistics. We start with general goalsand challenges faced in shape analysis, followed by a summary of the basic ideas, strengths and limitations,and applications of different mathematical representations used in shape analyses of 2D and 3D objects. Theserepresentations include point sets, curves, surfaces, level sets, deformable templates, medial representations,and other feature-based methods. We discuss some common choices of Riemannian metrics and computationaltools used for evaluating geodesic paths and geodesic distances for several of these shape representations. Then,we study the use of Riemannian frameworks in statistical modeling of variability within shape classes.Next, we turn to models and algorithms for activity analysis from various perspectives. We discuss how math-ematical representations for human shape and its temporal evolutions in videos lead to analyses over certainspecial manifolds. We discuss the various choices of shape features, and parametric and non-parametric modelsfor shape evolution, and how these choices lead to appropriate manifold-valued constraints. We discuss appli-cations of these methods in gait-based biometrics, action recognition, and video summarization and indexing.For reader convenience, we also provide a short overview of the relevant tools from geometry and statistics onmanifolds in the Appendix.

© 2012 Elsevier B.V. All rights reserved.

1. Introduction

Nonlinear manifolds have a special place in problem solutionswhere constraints of the problems restrict the domains to some inter-esting, structured sets. As a motivation, consider the problem of opti-mizing an objective function over a Euclidean space, with the solutionconstrained to have a unit norm. Although one can solve this, andsimilar problems, using Lagrange multipliers, it seems more naturalto restrict the search directly to the constrained set. This constrainedset will be a unit sphere in this example. Oftentimes, the differentialgeometry of these constrained spaces, or manifolds, guides us toreach more efficient solutions. Besides being mathematically appeal-ing, the solutions based on the geometry of the underlying manifoldsare often faster and more stable than their Lagrangian counterparts.This fact has been exploited in many branches of science and engi-neering, in developing methodologies, algorithms, and systems. Forexample, in the area of control of systems, Brockett [15] developeda system theory for Lie groups and their quotient spaces in the 70s,

followed by many control theoretic solutions on general manifolds[38,37,36,79,139].

The explicit use of manifolds, Lie groups, and Lie group actionson manifolds in the pattern recognition area was pioneered by UlfGrenander. His formulation of pattern theory for representation of com-plex systems is based on using combinations of simple elementaryunits, called generators, which under the actions of small groups capturethe variability exhibited by the observed systems. This theory was pre-sented in a series of monographs [47,48], culminating in the textbook[49]. In terms of problems in image understanding and computer vision,some applications of Grenander's pattern theory appeared in the jointworks with Amit [3], Keenan [50], and Miller [51,52,92]. This laid thefoundation for a principled approach to the development of deformabletemplate theory that has been used extensively in a variety of applica-tions. We will describe its contributions in shape analysis and objectcharacterization in a later section.

Focusing on the area of computer vision and image understanding,there are many problems that can naturally be posed as problems inoptimization or statistical inference on nonlinear manifolds. This isbecause there are some intrinsic constraints that force features ofinterest to take values on nonlinear spaces. While there are manysuch features, the most frequent types of nonlinear manifolds encoun-tered in computer vision applications are: spheres, matrix Lie groups,and their quotient spaces. Kanatani [65] provides a nice discourse on

Image and Vision Computing 30 (2012) 398–416

☆ This paper has been recommended for acceptance by special issue Guest EditorRama Chellappa.⁎ Corrresponding author.

E-mail addresses: [email protected] (A. Srivastava), [email protected] (P. Turaga),[email protected] (S. Kurtek).

0262-8856/$ – see front matter © 2012 Elsevier B.V. All rights reserved.doi:10.1016/j.imavis.2012.03.006

Contents lists available at SciVerse ScienceDirect

Image and Vision Computing

j ourna l homepage: www.e lsev ie r .com/ locate / imav is

Page 3: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

the use of groups for understanding structures in images. To cite anexample, consider the problem of tracking and recognizing objectsin video data. The relative pose of objects, with respect to the camera,is conveniently represented as an element of the special orthogonalgroup [65,53,96]. There has been a great interest recently in statisticalanalysis of symmetric, positive-definite (SPD) matrices. One motiva-tion of this interest comes from the fact that the basic unit in DiffusionTensor-Magnetic Resonance Imaging (DT-MRI) is a 3×3 SPD matrixthat represents the diffusion tensor at each point of an imaged vol-ume. The goal in this problem is to estimate, interpolate, and denoisea uniform field of diffusion tensors and these tasks are performedusing the Riemannian geometry of the underlying tensor space[100,97,106,40]. Several problems in computer vision, signal process-ing, and pattern recognition utilize linear orthogonal projections ofhigh-dimensional data before statistical analysis. Examples of theseprojections are principal component analysis, Fisher's linear discrim-inant analysis ([9]) and optimal component analysis ([83]). The set ofall orthogonal bases forms a Stiefel manifold and the set of all sub-spaces forms a Grassmann manifold [118].

In this paper we will focus on the use of differential geometric toolsin two specific areas of computer vision: shape analysis and video-basedactivity recognition. We will consider both 2D and 3D objects for shapeanalysis and some examples of the objects that we consider areshown in Fig. 1.

The rest of this paper is organized as follows:

• In Section 2, we present a general discussion on shape analysis, in-cluding its goals, challenges, and some relevant tools from differen-tial geometry.

• In Section 3 we summarize some common approaches to 2D shapeanalysis, including landmark-based methods, shape analysis ofparameterized curves, deformable templates, and others.

• In Section 4, we consider their extensions and some novel ideas forshape analysis of 3D objects.

• In Section 5, we present various approaches in activity analysis thatinvolve manifold-valued parametric and non-parametric models forshape and feature evolution.

• In Section 6, we present some concluding remarks.

2. Geometric shape analysis

Shape analysis is a broad problemareawith a tremendous scope andpotential influence in all areas of science and engineering. This makes it

almost impossible for us to provide a complete picture of the differentefforts in shape analysis. Earlier surveys have covered more generalaspects related to shape descriptors [84,154]. In contrast, we choose tofocus on those works where differential-geometric tools have played acentral role. In particular, we will shortlist techniques that start withsome mathematical representations of objects, remove appropriateshape-preserving transformations, and lead to precise definitions ofshape spaces with proper shape metrics. The importance of havingwell-defined shape spaces with metrics and geodesics is that they areessential for developing statistical tools for shape analysis, includingprobabilistic shape models.

Perhaps the earliest known efforts in formalizing shape analysiscame from D'Arcy Thompson who tried to relate the shapes of func-tionally similar objects. He explored the possibility of making shapesvisually similar by applying transformations and making them closerthan they originally appeared. His treatment of shapes appears inthe form of a 1917 book titled “On Growth and Form”. Fig. 2 showstwo examples of using non-rigid transformations for matching twoseemingly different but functionally similar objects. The left exampleshows three crocodilian skulls and a way of comparing their shapes.Here the transformation is applied to the coordinate system inwhich the object is represented and not to the object itself; theappearance of the object changes accordingly. As another example,he compared the shapes of two fishes – Argyropelecus olfersi andSternoptyx diaphana – using a similar transformation of the coordi-nates, shown in the right. Since D'Arcy Thompson's work, shape anal-ysis has emerged as a primary scientific problem of our time withmany powerful tools and techniques for analysis.

2.1. Shape analysis goals

In order to survey different geometric techniques for shape analy-sis, one needs to specify the general goals in this area. We start by list-ing the most important goals in shape analysis:

(1) Quantification of shape differences: firstly, given any two objects,one would like to be able to quantify the similarities and dissim-ilarities between their shapes. For this we need to specify a shapespace and a proper metric on that space. Although there may bemany techniques available for quantifying shape differences,there are some benefits in quantification using a proper distancefunction, i.e. the quantity satisfies all three properties of a shapedistance, including the triangle inequality. Several popular

Caudate Putamen Thalamus Pallidus Frontal LobeParietal Lobe

Fig. 1. Examples of 2D and 3D objects for shape analysis. The middle row shows some anatomical parts taken from sub-cortical regions of a human brain.

399A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 4: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

approaches are based on extracting certain geometric featuresfrom images of objects and comparing their histograms, forinstance using Kullback–Leibler divergence. However, such com-parisons do not lead to a proper distance.

(2) Building a template: for a sample taken from a shape class, onewould like to develop a template for the class. This template canbe very useful in efficient class comparisons, shape retrieval,hierarchical models, and so on. In statistical terms, any estimateof the populationmean, e.g. the samplemean, can serve as a tem-plate for the class, and we would like efficient tools for definingand computing the sample mean of a given set of shapes.

(3) Modeling shape variations: perhaps the most important goal inshape analysis is to develop statistical models for variations ofshapes within and across shape classes. If one can developprobabilistic models for shapes, not only it becomes relativelysimple to perform classification of shapes but it also providesprobability distributions associated with estimates.

(4) Shape clustering, classification and estimation: lastly, givenshape metrics, sample statistics, and probability models onshape spaces, it will become possible to extend tools frompattern recognition and machine learning from Euclidean ran-dom variables to shape spaces. For instance, one can developtechniques for shape estimation, shape-based clustering of ob-jects, and hypothesis testing for shape classification.

All of these goals can be naturally accomplished in a Riemannianframework. Geodesics and geodesic distances provide shape compari-sons, while statistical means of shapes can provide templates. Similarly,the use of statistical modeling on shape manifolds under Riemannianframeworks can lead to probabilistic shape analysis. Therefore, we willfocus primarily on advances made in differential-geometric approachesto shape analyses of 2D and 3D objects, and their use in activityrecognition.

2.2. Desired invariance

What are the important challenges in accomplishing these goals?The most important challenge is to discard variables that do not affectthe shape of an object. To understand this issue, consider Fig. 3 whichshows 16 objects at different rotations, translations, and scales. Despitethese variabilities, the shapes of all these objects are identical, and onewants a framework that ignores these variables and treats their shapesas equal. In other words, any shapemetric should return a value of zerofor any of these two objects. Kendall [67] described shape as a propertyof an object once its rotation, translation, and scale are removed fromthe representation.

In case one is interested in shapes of parameterized objects, suchas curves, surfaces, etc., then there is an additional issue. We explainthis point using a parameterized curve shown in the leftmost panelin Fig. 4; denote it as β tð Þ ¼ βx tð Þ;βy tð Þ� �

∈R2 with t∈ [0,1] beingthe parameter. Now, if we re-parameterize the curve with a re-

parameterization function γ : [0,1]→ [0,1], we obtain the curve~β tð Þ ¼ β γ tð Þð Þ. The curve ~β tð Þ has different coordinate functions~βx tð Þ; ~βy tð Þ

h ibut exactly the same shape as the original curve. This

is depicted in Fig. 4 which shows, from left: (1) 2D curves

(βx(t),βy(t)) and ~βx tð Þ; ~βy tð Þ� �

plotted on top of each other, (2) 3D

curves (t,βx(t),βy(t)) and t; ~βx tð Þ; ~βy tð Þ� �

; (3) βx(t) and ~βx tð Þ versust, (4) βy(t) and ~βy tð Þ versus t, and (5) the re-parameterization func-tion γ(t) versus t (broken line, the solid line is the identity function).This example suggests that any re-parameterization of β changes itscoordinate functions but not its shape. Thus, one would like a frame-work for comparing shapes of curves that is invariant to how thecurves are parameterized, and puts a zero distance between shapesof β and ~β .

The transformations associated with rotation, translation, scaling,and re-parameterization are termed shape preserving and, from theperspective of shape analysis, are nuisance variables. As describednext, the most common tool for removing the nuisance variables inshape analysis is an algebraic one.

2.3. Invariance using orbits

There are two important points to this approach. Firstly, in orderto identify different instances of the same shape but with differentnuisance variables, one uses the notion of equivalence relations. Sec-ondly, and more importantly, an elegant and principled framework isobtained using the group structure of the sets of nuisance variables.Since rotation, translation, scaling, and re-parameterization are allLie groups, one uses their group actions on shape representationspaces to form orbits and defines an equivalence relation based onthe membership of these orbits.

Fig. 2. Examples from D'Arcy Thompson's work on measuring differences in shapes of related objects by means of simple mathematical transformations. The left example studiesvariations in shapes of crocodilian skulls, while the right example compares the shape of a Argyropelecus olfersi with that of a Sternoptyx diaphana.Data courtesy of Wikipedia Commons.

Fig. 3. Similarity transformation: sixteen curves with different orientations, scale, andlocations, but with identical shapes.

400 A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 5: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

We start with a general description. Assume that group G acts on amanifold M. For any p∈M, the orbit of p under the action of G is de-fined as the set [p]={(g,p) :g∈G}. The orbit of a point in M refersto all possible points one can reach in M using the action of G onthat point. Depending upon the action of G on M, the quotient spaceM/G inherits a Riemannian metric from M as follows.

Definition 1. If a Lie group G acts on a Riemannian manifold M byisometries (see the definition of isometry in Appendix A), and theorbits of G in M are closed, then we may define a distance functiondM/G on the quotient space M/G, using the original distance functiondM on the manifold M, as follows:

dM=G ½p1ð �; ½p2�Þ ¼ ming∈G

dM g;p1ð Þ;p2ð Þ ¼ ming∈G

dM p1; g; p2ð Þð Þ: ð1Þ

In other words, fix a point on one orbit and search over the orbitof the second point to minimize the distance between them. Theshortest path between the two orbits in M can also sometimes (notalways) be identified with a geodesic path on the quotient spaceM/G.

Let us consider some examples to understand this point. Let M ¼Rn�k be the set of all k-tuples in Rn. Each element x∈M represents ashape made of k ordered points in Rn (n=2 for 2D shapes and n=3for 3D shapes).

1. The translation group Rn acts on the vector space M by the action(y,x)=x+y1kT for any y∈Rn, x∈M. The orbits of this action areall possible translations of an object x. This group action is isometricunder the Euclidean distance on M, since ‖x1−x2‖=‖(x1+y1kT)−(x2+y1kT)‖. Here ∥⋅∥ denotes the Frobenious norm of a matrix.

2. The scale group R� acts on M−{0} by (a,x)=ax for any a∈R� andx∈M. The orbit of a nonzero vector is a straight line spanned bypositive scalings of this vector. If we assume the Euclidean distanceonM to make it a Riemannian manifold, then the action ofR� onMis not isometric because ‖x1−x2‖≠‖ax1−ax2‖ for any a≠1. But itbecomes isometric under another Riemannian metric. For any twovectors v1,v2∈M, define the Riemannian metric at a point x∈M by

v1; v2h ih ix ¼v1; v2h ix; xh i :

Under this metric on M, we have the required isometry under thescaling group action: ⟨⟨av1,av2⟩⟩ax= ⟨⟨v1,v2⟩⟩x. Note that if ‖x‖=1,then ⟨⟨v1,v2⟩⟩x= ⟨v1,v2⟩. That is, if the shape configurations arereduced to have norm one, then the newmetric becomes the stan-dard Euclidean inner product.

3. The rotation group SO(n) acts on M by the action (O,x)=Ox, thematrix–matrix multiplication, for all O∈SO(n) and x∈M. Thisgroup action is isometric, under the Euclidean distance on Rn,because ‖x1−x2‖=‖Ox1−Ox2‖, for all O∈SO(n), x1,x2∈M.

4. We will introduce the re-parameterization group, its action onspace of parameterized curves and the proper metrics later.

In shape analysis, one uses a direct product or a semi-direct prod-uct of these and other groups to form joint orbits on object manifoldsand defines equivalence relations under the joint actions. Each orbit isassociated with a unique shape and shapes are compared by comput-ing a distance between the orbits. Hence, the set of all the orbits,i.e. the quotient space M/G, and the metric on that space becomeimportant.

In some cases it is possible to identify the quotient spaceM/Gwitha subset of M, and that greatly simplifies the analysis. Instead of com-puting geodesics and distances in M/G, by solving minimizationproblems of the type given in Eq. (1), one can simply compute geode-sics and distances directly between any elements of this subset andthe results are equivalent. Of course, one requires that this subset,called an orthogonal section, satisfies certain properties to achievethis simplification. A subset S⊂M is an orthogonal section of M if:(1) S meets each orbit of M (under G) in only one point, (2) at anypoint in S the tangent space to the orbit is perpendicular to the tan-gent space to S, and (3) the sum of these two tangent spaces is thefull tangent space to M at that point. If S intersects every orbit of Mthen it is called a global orthogonal section. In case M ¼ Rn�k, underthe Euclidean Riemannian metric:

(1) The global orthogonal section ofM under the translation group

is given by: X∈Rn�k ∑ki¼1xi ¼ 0

��� on.

(2) The global orthogonal section of M under the scaling group is

given by: X∈Rn�k ∑ki¼1∑n

j¼1x2j;i ¼ 1

��� on.

(3) There is no orthogonal section under the rotation group.

So, we conclude that the set X∈Rn�k ∑ki¼1 xi ¼ 0;∑k

i¼1∑nj¼1

���nx2j;i ¼ 1

ois an orthogonal section of M ¼ Rn�k under the joint action

of the translation and the scale groups, but not the rotation group.

3. Common approaches for 2D shape analysis

With this mathematical background, we will survey the mainideas in the literature on shape analysis of planar objects. While itseems natural to study their shapes by treating them as parameter-ized curves, there are several other possibilities. Figs. 5–6 illustratesome of these representations. The top row (second panel) showspoints sampled along these curves that form an ordered set of pointsin R2. This representation is used in two methods — landmark-basedshape analysis and active shape analysis. A related possibility is todiscard the ordering of points and simply treat them as a set of points,as shown in the top row (third panel). While we lose some informa-tion when we discard the ordering of points, we can now use someset-theoretic methods for shape analysis.

3.1. Landmark-based shape analysis

In case of ordered, or labeled, points an important approach,first laidout by Kendall [67] and advanced by several others [78,68,113,35] wasamongst the first formal treatments of shapes. The basic idea is tosample the object at a number of points, called landmarks, and formpolygonal shapes by connecting those points with straight lines. Ofcourse, the number and locations of these points on the objects candrastically change the resulting polygonal shapes but we will disregardthat issue for the moment. One can organize these landmarks in theform of a vector of coordinates and perform standard vector calculus.Let x∈Rk�2 represent k ordered points selected from the boundary ofan object. It is often convenient to identify points in R2 with elementsof C, i.e. xi≡zi=(xi, 1+ jxi, 2), where j ¼

ffiffiffiffiffiffiffiffi−1

p. Thus, in this complex

representation, a configuration of n points x is now z∈Ck. As described

0

0

0.2

0.4

0.6

x, xt

y, y

t

00.5

10

0

0.5

1

tx0 0.2 0.4 0.6 0.8 1

0

t

x, x

t

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

t

y, y

t

0 0.2 0.4 0.6 0.8 1

0.10.20.30.40.50.60.70.80.9

t

(t)

y

Fig. 4. Re-parameterization variability in curves and surfaces.

401A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 6: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

earlier, one considers the joint action of the translation and the scalinggroup on the set of such configurations and forms an orthogonal sectionof that set. Each z is then represented by the corresponding elementof the orthogonal section, often also called the pre-shape space:

D ¼ z∈Ck 1n∑k

i¼1 zi ¼ 0; ‖z‖ ¼ 1��� on

. The quotient space of Ck under

the translation and scaling groups is identified with D and one workswith D instead of the quotient space. D is not a vector space becausea1z1+a2z2 for, a1;a2∈R and z1; z2∈D, is typically not in D, due to theunit norm constraint. However, D is a unit sphere and one can utilizethe geometry of a sphere to analyze points on it. Under the Euclideanmetric, the shortest path between any two elements z1; z2∈D, alsocalled a geodesic, is given by the great circle: αksa : 0;1½ �→D, where

αksa τð Þ ¼ 1sin θð Þ sin θ 1−τð Þð Þz1 þ sin τθð Þz2½ �; and θ

¼ cos−1R z1; z2h ið Þð Þ: ð2Þ

In order to reach the shape analysis of these objects, we need toremove the action of the rotation group also. Let [z] be the set of all

rotations of a configuration z according to: z½ � ¼ e jϕz ϕ∈S1�� on

. One

defines an equivalence relation on D by setting all elements of thisset as equivalent, i.e. z1~z2 if there exists an angle ϕ such thatz1=ejϕz2, i.e. z1∈ [z2] and vice-versa. The set of all such equivalenceclasses is the quotient space D=U 1ð Þ, where U 1ð Þ ¼ SO 2ð Þ ¼ S1 is theset of all rotations in R2. This space is called the complex projectivespace and is denoted by CPn−1. A geodesic between two elementsz1½ �; z2½ �∈CPn−1 is given by computing αksa between z1 and eϕ*z2,where ϕ* is the optimal rotational alignment of z2 to z1. This rotationalalignment is found using:

ϕ� ¼ argminϕ∈S1

‖z1−e jϕz2∥2‖ ¼ argmax

ϕ∈S1R e−jϕ z1; z2h i� �� �

¼ θ;where z1; z2h i ¼ re jθ:

Thisminimization is an example of optimization over orbits to derivea proper distance function in a quotient space, according to Eq. (1). Thelength of the geodesic, given by θ, is a proper distance in the quotientspaceCPn−1 and it quantifies the differences in shapes of the boundariesrepresented by z1 and z2. For shape analysis of closed curves, oneremaining issue is which point should be selected as the first point orthe seed. If there are k points sampled on a curve, then there are k candi-dates for the seed. The solution is to select the best seed during a pair-wise comparison of configurations. That is, select an arbitrary point onthe first configuration as the seed for the first shape and try all k pointsin the second configuration as the seed for the second shape. Of those,select the one that results in the smallest distance from the first config-uration. Fig. 9 shows several examples of these geodesics, one between apair of human silhouettes, one between a pair of hands, and so on. Thesegeodesics have been computed using n=100 points on each configura-tion so that the resulting polygons look like smooth curves.

One simplification of this framework results in the active shapemodels (ASM) approach, introduced by Cootes and Taylor in [28]. Hereone discards the geometry of D and simply treats it as a vector space.Now, the distance between the two configurations is ‖z1−e jϕ�

z2‖ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 1−rð Þp

. The corresponding optimal deformation from one shape toanother is simply a straight line between z1 and ejϕ*z2, i.e. αasm(τ)=(1−τ)z1+τejϕ*z2 for τ∈[0,1].

3.1.1. Issue of landmark selectionAlthough Kendall's approach succeeds in preserving the unit–

length constraints on the landmark configurations, it does not ad-dress a very important practical issue: How to systematically selectpoints on objects, say curves, to form representative point sets? Thisprocess is difficult to standardize and different selections can lead to

drastically differing solutions. This issue is present in any point-based approach, including ASM. Some may suggest to sample acurve uniformly along its length, i.e. parameterize a curve β usingarc-length and sample {β(ti)|i=0,1,2,…,k} where ti= i/k. Althoughthis provides a standardized way of sampling curves, the results arenot always good since this forces a particular registration of points,i.e. the point β1(ti) on the first curve is matched to the point β2(ti)on the second curve, irrespective of the shapes involved. Fig. 7 illus-trates this point using an example. Shown in the left two panels aretwo curves: β1 and β2, sampled uniformly along their lengths. Forti= i/4, i=1,2,3,4 the corresponding four points on each curve{β1(ti)} and {β2(ti)} are shown in the same color. While two of thefour pairs seem to match well, the pairs shown in red and greenfall on different parts of the body, resulting in a mismatch of fea-tures. This example shows the pitfall of using uniform sampling ofcurves. In fact, any pre-determined sampling and pre-registrationof points will, in general, be problematic. A more natural solutionis to treat the boundaries of objects as continuous curves, ratherthan discretize them in point sets at the outset, and find an optimal(perhaps non-uniform) sampling, such as the one shown in therightmost panel, that better matches features across curves. Thisway one can develop a more comprehensive solution, including the-ory and algorithms, assuming continuous objects and discretizing themonly at the implementation stage.

3.2. Planar curve-based shape analysis

In this section we consider methods that utilize the full curves,and not just a set of landmarks, for analyzing shapes of objects.These curves are typically boundaries or silhouettes of objects in im-ages and it is natural to consider the shapes of full curves even if thedata is sometimes noisy or corrupted. Let a parameterized planarcurve be denoted as β tð Þ∈R2, where t∈ [0,1] is a parameter. In casethe curve is closed, it is more natural to parameterize it using thedomain S1 instead of [0,1] since β(0)=β(1) and there is no naturalpoint on the curve to call β(0). Assuming that _β tð Þ≠0, we can de-compose it in a polar form _β tð Þ ¼ s tð ÞΘ tð Þ, where s tð Þ ¼ ‖ _β tð Þ‖∈Rþis the speed function and Θ tð Þ ¼ _β tð Þ

s tð Þ denotes the unit vector in the di-rection of _β tð Þ. As in the previous section, one can use the complexnotation to write Θ(t)=ejθ(t). Thus, θ(t) is the angle made by the ve-locity vector _β tð Þ with respect to the x axis. If s(t)=1 for all t, i.e. thecurve is arc-length parameterized, then θ is a sufficient representa-tion of the curve β for shape analysis. Another possibility is to useκ tð Þ ¼ _θ tð Þ, the curvature function along the curve, in shape analysis.Fig. 8 shows these representations, coordinate functions, angle func-tion, or curvature function, for a closed planar curve.

3.2.1. Curves with fixed parameterizationsOne of the earliest ideas in shape analysis of curves came from

Zahn and Roskies [152], who studied arc-length parameterized curvesusing Fourier coefficients of the angle functions. Although angle func-tions have been used previously for shape analysis, see e.g. [103],Zahn and Roskies provided a formal treatment by focusing on theshapes of planar closed curves. Note that in representing a curve βby its angle function θ, one has already removed the effect of transla-tions of β. Furthermore, the effect of scaling has been removed usingan arc-length parameterization with the parameter taking value in S1,i.e. all the curves have been scaled to be of length 2π. One can evenremove the effect of the rotation of β by insisting that the integralof θ be a constant, say π. With this representation, Zahn and Roskiesused Fourier coefficients of such functions for comparing and analyz-ing shapes of curves. Since the main goal here was to compare shapes,i.e. provide a shape metric, they simply compared the Fourier coeffi-cients using vector norms rather than using the actual geometry ofthe underlying space.

402 A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 7: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

Klassen et al. [72,116] studied the differential geometry of the spaceof angle functions for closed curves and used a simple Riemannianmetric to compute geodesic paths between shapes of closed curves.Define the set of relevant angle functions to be:

C ¼ θ : S1→Rn ���∫S1 θ tð Þdt ¼ π; ∫S1 cos θ tð Þð Þdt ¼ 0;∫S1 sin θ tð Þð Þdt ¼ 0

o:

The last two conditions make C a nonlinear manifold. The issue ofplacement of the origin on the curve is handled by the action of S1 onC, and the quotient space S≡C=S1 becomes the shape space for this rep-resentation. Since the underlyingmanifoldS is too complicated to allowanalytical expressions for geodesics, they used a numerical approachcalled the shooting method to compute geodesic paths. The shootingmethod is a numerical approach for finding a geodesic between anytwo points on a Riemannian manifold. The basic idea is to designateone point as the source and the other as the target, and then find a tan-gent vector at the source so that the geodesic with that tangent as itsinitial velocity will reach the target in unit time. This tangent vector,or the shooting direction, is found by selecting an initial vector and

iteratively updating it using the gradient of a miss function that quan-tifies the separation between the endpoint reached and the target.This tool, when implemented for C under the L2 metric, is termed abending-only shape analysis as the curves here are restricted to arc-length parameterizations and the geodesic consisted only of changingthe angle functions in going from one shape to another. Klassen et al.[72] also demonstrated the use of the curvature function κ tð Þ ¼ _θ tð Þfor a similar geodesic-based comparison of shapes. To our knowledge,this work was the first to present the use of geodesics on shape spacesof closed curves in shape analysis. Later, several others, including Shah[107], developed a similar framework using the angle functions albeitmore formally. Shah [108] also developed similar solutions for curva-ture function representation. A number of Sobolev-type Riemannianmetrics for comparing shapes of closed curves are discussed in [91,90].

3.2.2. Planar curves with arbitrary parameterizationsAlthough the arc-length parameterization removes the parameteri-

zation variability of curves, it suffers from the same problem as a fixeduniform sampling discussed in Fig. 7. It is more natural to include arbi-trary parameterizations of curves in the analysis, and to seek optimal re-

Curves Ordered Samples Point Cloud Deformable Grid

Binary Image Medial Axis Signed-Distance Level Set

Fig. 6. Same as Fig. 5.

Curves Ordered Samples Point Cloud Deformable Grid

Binary Image Medial Axis Signed-Distance Level Set

Fig. 5. Different representations of closed curves for the purpose of analyzing shapes.

403A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 8: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

parameterizations during pairwisematching of curves. This allows for anonlinear registration of points across as desired in Fig. 7. In this sense,the problem of optimal re-parameterization is the same as that of regis-tration of points across curves. Here one considers the group of re-parameterization functions:

Γ ¼ γ : S1→S1n ���γ is an orientation� preserving diffeomorphism

o:

The action of Γ on the space of curves is given by composition,(β,γ)=βBγ. One of the earliest works on shape analysis of curveswith general parameterizations was by Younes [146,147] but to ex-plain its main contribution we use the notation introduced later byMio et al. [94,95]. As earlier, let s and θ denote the speed functionand angle function, respectively, along a curve β. To compare curvesrepresented by this pair of functions, Mio et al. [94,95] proposed thefollowing Riemannian metric. Let u1, u2 be two tangents to thespace of speed functions at s and v1, v2 be two tangents to the spaceof angle functions at θ. In other words, u1,u2 and v1,v2 are two per-turbations of s and θ each, respectively. Define a Riemannian metricas:

u1; v1ð Þ; u2; v2ð Þh i s;θð Þ ¼ a2∫10u1 tð Þu2 tð Þs tð Þdt þ b2∫1

0v1 tð Þv2 tð Þs tð Þdt; ð3Þ

where a, b are arbitrary positive constants. The first term measuresthe perturbations in the speed function (or the level of stretch/com-pression) and the second term measures the perturbations in theangle function (or the bending) along a path on the manifold ofcurves. The constants a and b provide the weights for the two termsand this Riemannian metric is called the elastic metric. An importantproperty of this metric is that under this metric, the action of there-parameterization group is by isometries (see Appendix A for thedefinition of isometry). Note that this isometry condition is importantto define distances on the shape (quotient) spaces using Definition 1(Eq. (1)). Since this metric changes from point to point on the

manifold of curves (s appears in the expression of the metric), it is dif-ficult to compute geodesic paths and geodesic distances under thismetric. Mio et al. [94,95] provided a shooting method for computinggeodesic paths under this elastic metric, for different combinationsof a and b, but the resulting algorithm was cumbersome.

However, a tremendous simplification results if a different mathe-matical representation of curves is used in the analysis. Younes [146]showed that if one uses the function q tð Þ ¼ ffiffiffiffiffiffiffiffi

s tð Þpe jθ tð Þ=2 to represent

β, then the elastic Riemannian metric in Eq. (3) reduces to the simpleL2 metric in the space of such functions. In terms of complex analysis,the q(t) denotes the complex square root of _β tð Þ ¼ s tð Þe jθ tð Þ, j ¼

ffiffiffiffiffiffiffiffi−1

p.

The equivalence of the elastic metric with L2 is based on a certainfixed choice of the weights (a, b) and thus represents oneelement of the family of elastic metrics. This equivalence is alsotermed as a flattening of the elastic Riemannian metric. The use of _βin defining q makes it invariant to any translation of the curve β.Conversely, one can reconstruct the curve β from q up to a trans-lation. Since the length of β, given by ∫S1‖

_β tð Þ‖dt ¼ 1, we have:∫S1‖q tð Þ‖2dt ¼ ∫S1‖

_β tð Þ‖dt ¼ 1. Consequently, the set of all such func-tions becomes a subset of the unit hypersphere in L2. A similar repre-sentation has also been used in the papers [149,122]. In fact, [149]showed that the closure condition, i.e. the constraint that a curve isclosed, in this representation takes an interesting form. If we writeq(t)=qr(t)+ jqi(t), then the condition that the unit length curve βis closed is equivalent to: ‖qr‖=1, ‖qi‖=1, and ⟨qr,qi⟩=0. Thus, onecan view the pair (qr,qi) as an element of a Stiefel manifold, the setof all pairs of orthonormal functions in L2. Furthermore, if oneremoves the rotation group SO(2), to form rotational orbits of planarcurves, the resulting pre-shape space is a Grassmann manifold with awell-known geometry.

One limitation of the complex square-root representation of thevelocity is that this is defined only for planar curves, i.e. this definitiondoes not extend to curves in R3 or higher. Another limitation is thatejθ(t)/2 can lead to a phase jump that can produce instability. For ex-ample, θ(t)=2π−� and θ(t)=2π+� will lead to very different

0 1 2 3 4 5 6 7−1.5

−1

−0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6 7−2

−1

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7−30

−20

−10

0

10

20

30

(a) (b) (c) (d)

Fig. 8. (a) A simple closed curve β, (b) the two coordinate functions x and y plotted against the arc length, (c) the angle function θ, and (d) the curvature function κ.

1 (uniform sampling) 2 (uniform sampling) 2 (non-uniform sampling)β β β

Fig. 7. Registration of points across two curves using the uniform and convenient non-uniform samplings. Non-uniform sampling allows a better matching of features between β1

and β2.

404 A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 9: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

complex square-roots despite having similar values. In order to solvethese two issues, Srivastava et al. [64,63,117] utilized a different rep-resentation, called the square-root velocity function (SRVF), given by:

q tð Þ ¼ffiffiffiffiffiffiffiffis tð Þ

pe jθ tð Þ ¼

_β tð Þffiffiffiffiffiffiffiffiffiffiffiffiffiffi‖ _β tð Þ‖

q :

In this definition, one only takes the square-root of the speedwhile the direction (or the angle) function remains unchanged. Thisavoids the problem of phase jumps in computing halves of the anglesand, also, the right side of the equation is applicable to curves inRn forany n. Once again, the elastic metric in Eq. (3) reduces to the L2 met-ric for a certain value of the ratio a/b, i.e. the elastic metric is flattenedunder this representation. (It turns out that in R2 one can obtain flat-tening using any function of the type:

ffiffiffiffiffiffiffiffis tð Þp

ejθ tð Þ=k for k>0. In this ter-minology, Younes' choice was k=2 and Srivastava et al. chose k=1.)

Continuing the discussion for k=1, we restrict to the curves of in-terest, represented by their SRVFs:

C ¼ q : S1→R2n ���∫S1 q tð Þ‖q tð Þ‖dt ¼ 0;∫S1 ‖q tð Þ‖2dt ¼ 1

o:

C is called the preshape space and is the set of SRVFs of allunit lengths, closed curves in R2. C is a nonlinear manifold becauseof the closure condition. Geodesics in C are computed using a path-straightening algorithm first introduced in [71] and later used forfinding geodesics in C in [64,63,117]. We have mentioned fourshape-preserving transformations: translation, scale, rotation, andre-parameterization. Of these, we have already eliminated the firsttwo from the representations, but the other two remain. Curves thatare within a rotation and/or a re-parameterization of each other re-sult in different elements of C despite having the same shape. The uni-fication of such curves is performed using group actions and theirorbits. For a curve β, a rotation O∈SO(2) and a re-parameterizationγ∈Γ, the transformed curve is given by O(β Bγ). The SRVF of thetransformed curve is given by O qBγð Þ

ffiffiffiffi_γ

p. In order to unify all ele-

ments in C that denote the same shape, we define equivalence classesof the type:

q½ � ¼ closure O qBγð Þffiffiffiffi_γ

q ���O∈SO 2ð Þ;γ∈Γ�:

The closure is taken to ensure that the conditions of Definition 1apply, i.e. the orbits form closed sets. Each such equivalence class [q]is associated with a shape uniquely and vice versa. The set of all theseequivalence classes is called the shape space S ¼ C= SO 2ð Þ � Γð Þ. Geode-sics and geodesic distances in S are computed by solving the optimiza-tion stated in Eq. (1) according to:

dS q1½ �; q2½ �ð Þ ¼ infγ∈Γ;O∈SO 2ð Þ

dC q1;O q2 Bγð Þffiffiffiffi_γ

q �:

The minimization over Γ is performed using dynamic program-ming and over SO(2) using the Procrustes method.

Fig. 9 shows some examples of geodesic paths between planarshapes using three different methods: active shape models (ASM),Kendall's shape analysis (KSA), and elastic shape analysis (ESA) ofthe previous paragraph. In each case, the first and the last shapesare the given curves and the intermediate shapes represent equallyspaced points along the geodesic paths in the respective spaces. It iseasy to see the improvement in the quality of the geodesic deforma-tion between shapes when using ESA.

3.3. Deformable templates

A popular technique for analyzing shape variability of objects inimages, especially in medical imaging applications [52], is deformable

templates. Although the foundational work in development of itstheory and applications came from Grenander and his colleagues,[3,126,52,92], several other groups have applied this approach tocomputer vision problems [59,150].

We briefly describe the mathematical setup. Let I ¼ I : D→Rf jI∈L2

D;Rð Þg be the set of images defined on the domain D=[0,1]2. Let Γ bethe set of corner-preserving diffeomorphisms of D to itself. Then, Γ actson I from the right by composition, i.e. (I,γ)= IBγ. Notice that theeffect of this action on an image is to move pixels around in D, it doesnot create or destroy any pixel values in the image. Several papers havestudied the problem of comparing shapes contained in images by solvingthe matching (registration) problem of the type:

minγ∈Γ

∫D‖I1 xð Þ−I2 γ xð Þð Þ‖2dxþ λR γð Þ� �

; ð4Þ

whereR is some kind of penalty on roughness ofγ. Thefirst term is calledthe data term and the second term is called the smoothness term. There areseveral choices for the roughness term including ∫D‖∇γ(x)−1‖2dxand ∫D‖∇ 2γ(x)‖2dx. The other choices for the data term also include∫D‖∇ I1(x)−∇ I2(γ(x))‖2dx. The optimization over Γ to find the optimalγ* is mostly performed using gradient-type techniques. Fig. 10 showsexamples of deforming I2 to match I1 under the cost function given inEq. (4).

A significant limitation of this framework is that it does not lead toa proper distance for comparing shapes in images. In fact, the objectivefunction is not even symmetric [26]. Since one needs a proper metricfor performing shape analysis and other related tasks, Younes et al.[93,126,148] utilized the group structure of Γ to impose Riemannianmetrics on Γ and the product space Γ � I . Using these metrics, onecan generate geodesic paths, compute geodesic distances, and com-pute simple shape statistics. An alternate solution that leads to propermetrics on the quotient space I=Γ is presented by Xie et al. [143].

The strength of the deformable template approach is that it takesinto account the full image and not just the information associatedwith boundary contours. On the other hand, this becomes a liabilityin situations where one is interested in comparing shapes of curves.In those situations, one has to embed curves in 2D image domains,for example by forming binary images or level sets, and then applythis technique. Since one has to estimate a 2D diffeomorphism inthe matching problem here (Eq. (4)), this method is computationallymuch more expensive than the methods discussed earlier for shapeanalysis of curves (e.g. the elastic shape analysis).

3.4. Other ideas

While several other methods have been used for shape analysisin computer vision, only a small subset of them use differential-geometric tools. So far we have focused on the differential-geometricmethods but now we briefly comment on some other common ideas.

One popular idea in computer vision has been to use level sets ofclosed contours [98,124]. Here the contours are embedded in a subsetof R2 in an arbitrary way and are represented by certain functions on[0,1]2 such that the level curves of those functions are the originalcontours. A common choice of function is the signed-distance func-tion. Let β denote a simple closed curve inR2. A signed-distance func-tion ϕ : R2→R≥0 is a function whose value at a point is the Euclideandistance in R2 to the nearest point on the curve. Additionally, thefunction values for points inside the contour are negative:

ϕ xð Þ ¼ −d x;βð Þ; x is inside the curved x;βð Þ; otherwise

; d x;βð Þ ¼ infy∈β

‖x−y‖:

ð5Þ

It is easy to see that the set x∈R2� ��ϕ xð Þ ¼ 0 is exactly the set of

points on the curve β. In other words, β is a zero level-set of ϕ.

405A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 10: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

Figs. 5–6 show two examples of this representation: the top-leftcurve is represented by its signed distance function shown in two dif-ferent ways in the bottom right: as a gray scale image and as a heightfunction.

For closed contours β1 and β2 with given embeddings in [0,1]2, letϕ1 and ϕ2 denote the corresponding signed-distance functions on[0,1]2. Then, the shapes of β1 and β2 can be compared using the L2

distance:

d β1;β2ð Þ≡‖ϕ1−ϕ2‖ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∫ 0;1½ �2 ϕ1 xð Þ−ϕ2 xð Þð Þ2dx

q:

Any rigid motion of the βs will change the distance between thecorresponding level functions. One can handle two of the transforma-tions: translation and scaling, by pre-processing the curves β1 and β2

as in the previous sections. For example, we can translate them sothat their mean is at the origin and scale them so that their norm isone:

β tð Þ→β tð Þ−∫β tð Þdt and β tð Þ→ β tð Þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∫‖β tð Þ‖2dt

q :

The removal of the rotation requires solving an optimization prob-lem, as was the case in the earlier frameworks. In some applicationswhere the orientation is naturally given and important in shape dis-crimination, e.g. in tracking silhouettes of humans walking upright,one can simply ignore the orientation alignment [29,30].

Note that this approach lacks the use of tools from differentialgeometry. The set of valid signed-distance functions forms a nonlinearspace but its geometry is seldom utilized to keep representations valid[45]. Since this method is not based on a Riemannian metric, it doesnot naturally provide geodesic paths on shape spaces of curves. Therehave been some improvisations of this approach into Riemannianframeworks but typically they tend to be computationally expensive.The main use of the level set approach has been in the segmentationprocess where the active contours have been derived to discoverboundaries of objects in images [19].

Another common technique for shape analysis of objects is based onmedial axis representations [111,42,41]. The basic idea for 2D objects is

to find a set of points that represent the skeleton of the contour in aspecific way. Together with a radius function along the contour, this de-fined an alternate to the boundary representation of curves. Two exam-ples of this idea are shown in Figs. 5–6. Some papers [42] have used thegeometry of spaces associated with the medial curves and the radiifunction to define geodesic paths between planar shapes.

Some papers have also used Hausdorff-type distances, i.e. distancesbetween sets of points, to compare shapes. These computations aretypically dependent on object pose and scale, but there have beenefforts to interpret Hausdorff-type distances in a way such that theyare invariant to shape-preserving transformations [23]. However, theresulting computational cost has been large for use in large scale shapeanalysis.

4. 3D shape analysis

In this section we summarize techniques that have been used tostudy shapes of 3D objects. While many of these techniques are nat-ural extensions of their 2D counterparts, some of them are based onnovel ideas applicable just to shape analysis of 3D objects. The maindifficulty in 3D shape analysis is the registration/matching of pointsacross objects. While the points on 2D shapes represented by curveshave a fixed ordering and only one-dimensional diffeomorphismsare needed for registration, there is no canonical ordering of pointson 3D objects and, thus, the search space for shape registration ismuch larger. Consequently, the registration methods have been themain focus of attention in such problems, more so than the eventualshape analysis. Nonetheless, we want to stress the importance of per-forming the registration of shapes and the subsequent analysis, suchas computation of geodesics or statistics, in a unified framework.Many of the described methods either provide a solution to onlyone of the two given problems: registration and shape analysis(deformation, comparison, averaging, etc.). If they do both, they typ-ically do so in two disjoint steps. To our knowledge, there are veryfew works that perform registration and deformation simultaneously.Next, we summarize different mathematical representations thathave been developed to analyze shapes of 3D objects; some of themare illustrated in Fig. 11. These various methods originated in differ-ent communities such as graphics or medical imaging, and conse-quently they approach the problem of shape analysis in verydifferent ways.

ASM

KSA

ESA

ASM

KSA

ESA

Fig. 9. Examples of geodesic paths between the same shapes using ASM, KSA, and ESA.

406 A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 11: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

4.1. Point distribution models (PDMs)

The landmark-based shape analysis of objects, discussed earlier for2D objects, can be easily extended to shape analysis of 3D objects. Al-though tools from complex analysis are no longer available, the basicideas for representations and matching of shapes are still applicable.While registered point clouds can be easily compared using Procrustesmethods, the problem of registering points across shapes is more com-plicated. In terms of registration of points, researchers have used a vari-ety of cost functions for optimization-based matching. The iteratedclosest point method (ICP) [10], that is based on nearest neighborregistration, is still quite popular for comparing shapes of point cloudsdespite some fundamental limitations.Whitaker et al. have usedmutualinformation of distributions of points to align them across objects[21,20]. Cootes and Taylor have used ideas based on the minimum de-scription length principle [32]. Several earlier methods for matching2D point distributions [27,60] have also been applied to 3D problems.

Although these point-based representations are efficient in regis-tering points across objects, there are some important limitations inthis work when it comes to shape analysis. These approaches lead toa two step analysis: one for registration and one for shape analysisafter registration. The criterion for registration of points, e.g. entropyor description length, is often different from the metric used for shapeanalysis, which is usually the Euclidean distance. So, even if the proce-dures are optimal under the individual cost functions, the overall resultis bound to be sub-optimal. A better approach is to perform registrationand analysis of shapes under a single cost function. One additional lim-itation of PDMs is that the registration mapping is not guaranteed to bea proper diffeomorphism. That is, the mapping function can result incrossings of points (space folding) that can cause problems in shapeanalysis.

4.2. Deformable templates

The deformable template approachmentioned for 2D images can beextended to the 3D case by considering image domains to be cubesrather than squares. Several groups have proposed methods for study-ing shapes of surfaces by embedding them in cubical volumes [0,1]3

and deforming these volumes [52,62,31,130,7,137,136,138]. Whilethese methods are both prominent and pioneering in medical imageanalysis, they are typically computationally expensive since they tryto match not only the objects of interest but also some backgroundspace containing them. Furthermore, the energy function often usedfor registration in these frameworks has some fundamental issues.We explain in more detail next. Consider the following energy forregistration:

Eð f 1; f 2;γÞ ¼ ∫ 0;1½ �3‖ f 1 sð Þ−f 2 γ sð Þð Þ‖2dsþ λΦ γð Þ; ð6Þ

where f1 and f2 are two volumes, γ is a diffeomorphism from [0,1]3 toitself, Φ is a regularization term, and λ is its corresponding weight. Asan example, in [7], the regularization term is given by the length ofthe geodesic path in the diffeomorphism group from identity to theoptimal deformation that matches the two images. The first term in Eis often referred to as the data term. This term is problematic whenused for registration. First, using this energy, the registration of surfacesis not symmetric. Second, if we apply the same transformation γ to twodifferent volumes, the L2 distance between them changes althoughthe registration remains the same. Thus, an energy function based onthe L2 norm of the difference image is not a good measure for registra-tion. Ideally, we would like an energy function E such that E(f1, f2;γ)=E( f1Bγ0, f2Bγ0;γ) for any γ0. In fact, this issue is omnipresent in nearlyall the current methods used for registering 2D and 3D images. For in-stance, the paper by Tagare et al. [123] solves the problemof asymmetryin the registration cost function but does not address this issue of thecost function changing values even when the registration remains thesame. To our knowledge, Xie et al. [143] is the only paper to addressthis issue.

4.3. Geometric descriptors

There is a large family of works that focuses only on registration ofpoints across objects using geometric features along their surfaceboundaries. Once again they focus on registration and do not provideany explicit deformation (geodesics) or metrics (geodesic distances)between the shapes of objects. In fact, the strategy is to often use alinear interpolation between the corresponding points to obtain a de-formation, once the correspondence is established. In these works,correspondences are established by matching feature points usingtheir geometric descriptors [22] and variants of non-rigid ICP. Whenlarge shape variations are tolerated, local shape descriptors can be-come unreliable. Previous works dealt with this problem by enforcingsome global consistency, such as restricting the set of possible defor-mations to being isometric [58]. Others considered nearly isometricshapes [70,81], and elastic deformations [141,142]. Zhang et al.[155] use a deformation-driven approach guided by a set of manuallyor automatically selected landmarks. While this approach enablesfinding sparse correspondences under significant non-rigid deforma-tions, it does not generate geodesics between the input shapes, and ituses different functions for correspondence and deformation.

4.4. Geodesic distance and related curve representations

Many approaches treat the surface of a 3D object as a Riemannianmanifold and compute geodesic distances between points on this sur-face using Dijkstra's algorithm for marching front ideas. Once all thepairwise distances are obtained one uses statistics of such distances,e.g. their histograms, for comparing shapes [89,17,18,16]. The dis-tances between points can be modified to include diffusionmaps [77], or eigenfunctions of the Laplace–Beltrami operator [82],and many others. These methods are limited in that they are oftenable to tackle only isometric deformations of shapes, i.e. deformationsthat preserve distances between points on a surface. In other words,the shape resulting from any local or global bending of an originalshape is deemed exactly identical to the original. Furthermore, it isdifficult to derive summary statistics of full shapes and develop prob-abilistic shape models since they are not Riemannian in the samesense as the other methods discussed in this paper. There are noknown algorithms for computing optimal deformations (or geode-sics) under these metrics at the moment.

Several papers have studied shapes of 3D objects by approximat-ing their surfaces using simpler geometries such as curves. The niceproperty of these methods is that they can provide both registrationand deformation (geodesics) using shape analysis of curves. For ex-ample, Samir et al. [104] approximate the surface with level curves

Fig. 10. An example of deformable images. I2 is deformed smoothly to match I1 and theoptimal deformation is shown on the right.

407A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 12: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

of the height function, while they use level curves of the surface dis-tance function (from the tip of the nose in case of 3D faces) in[105,119]. While the choice of the height function is problematicsince the resulting curves vary with the rotation of the face, the useof the distance function avoids that problem. Fig. 12 shows an exam-ple of an increasing set of curves representing a facial surface. Anoth-er possibility is to use the radial curves, i.e. streamlines undergradient vector fields of the surface distance function (from a refer-ence point), to approximate the surface shape [54,33,34]. Hallinanet al. [55] use ridge curves for studying shapes of facial surfaces andsome other use crest lines [145].

4.5. Parameterized surfaces

Although the most natural representation for studying shapes of3D objects seems to be parameterized surfaces, it has not been usedas frequently. In case of parameterized surfaces, there is an additionalissue of handling the parameterization variability. Some papers, e.g.using SPHARM [66,14] or SPHARM-PDM [120,44], tackle this problemby choosing a fixed parameterization that is analogous to the arc-length parameterization on curves. Kilian et al. [69] presented a tech-nique for computing geodesics between triangulated meshes (discre-tized surfaces) but at their given parameterizations. Similar to elasticrepresentations of curves discussed earlier, one would like to includethe parameterization variable in the analysis. This inclusion results inan improved registration of features across surfaces. Of course, thequestion is: how can we include the parameterization variable inour shape analysis? A large set of papers in the literature treat param-eterization (or registration) as a pre-processing step [131]. In otherwords, they take a set of surfaces and use some energy function,such as the entropy [21] or the minimum description length [32], toregister points across surfaces. As mentioned earlier, these methodslack the formalism needed to define proper distances due to disjointchoices for registration and shape comparisons.

Kurtek et al. [75,74] describe a novel representation of surfaces,called q-maps, for analyzing shapes of surfaces in a manner that is in-variant to their parameterizations. Let f : S2→R3 denote a parameter-ized surface, viewed as an embedding of S2 inR3. Let Γ be the set of alldiffeomorphisms of S2 to itself; Γ forms the re-parameterizationgroup for parameterized surfaces. The q-map is defined as a mapping

q sð Þ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffi‖a sð Þ‖

pf sð Þ, where ‖a(s)‖=‖ fx(s)× fy(s)‖ is the area

multiplication factor of f at s ¼ x; yð Þ∈S2. (This q-map is similar inspirit to the SRVF used earlier for elastic curves.) If a surface is re-parameterized according to f↦ fBγ, then its corresponding q-map is

given by qBγð ÞffiffiffiffiffiJγ

qwhere Jγ is the Jacobian of γ. The main reason

for using q-maps in surface analysis is that the action of re-parameterization group is by isometries, i.e. ‖q1−q2‖=‖(q1,γ)−(q2,γ)‖ for all γ∈Γ, and ideas from Definition 1 apply. The rotationand re-parameterization orbit of a surface, represented by q, is

given by q½ � ¼ closure O qBγð ÞffiffiffiffiffiJγ

qn ���O∈SO 3ð Þ;γ∈Γo. They use these

q-maps for optimal re-parameterization of surfaces (or registrationof points across surfaces) and for computing geodesic paths betweenaligned surfaces [76,73]. The strength of this approach is that both theregistration and computation of geodesics are performed under thesame metric. Shown in Fig. 13 are some examples of geodesic pathsbetween shapes of 3D closed surfaces, where the surface correspon-dence is displayed using a color map. If one or both surfaces are avail-able under different parameterizations, i.e. different meshes on thesame surface, the resulting geodesic and geodesic pathswill not change.More recently, a paper by Bauer and Bruveris [5] also accomplishesthese general goals albeit under a different Riemannian metric.

4.6. Other methods

Here we summarize some other popular ideas in shape analysis of3D objects.

1. Level sets: the level set methods can be easily extended to the 3Dproblem by considering functions on a volume domain such thattheir level sets are now surfaces of 3D objects. Once again, theyhave been more prominent in segmentation of volumes into ho-mogenous regions [99] or surface reconstruction [156] but face dif-ficulties when used for statistical shape analysis [87,88].

2. Medial and skeletal representations: there has been remarkablesuccess in the use of medial representations for 3D shape analysis[112,151], especially inmedical image analysis, see e.g. [13], [46]. Al-though some of these methods provide deformations between ob-jects, they often lack the registration abilities of the elastic methods.

3. Canonical domain embeddings: some approaches rely on embed-ding the surfaces to a canonical domain while preserving certainlocal features such as inner products, angles, or surface area [43]and find an optimal diffeomorphism between the two domains

PDMs Deformable Templates Skeletons Geodesic Distances

Morse Function Geometric Descriptors Domain Embedding Parameterized Surfaces

Fig. 11. Different representation of shapes of 3D objects.

408 A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 13: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

that are parameterizing the surfaces. The optimality is establishedusing an energy function that reduces the registration to an opti-mization problem over the diffeomorphism group. In[109,81,153,70], the authors start by conformally embedding thesurfaces on a sphere and then find the best Möbius transformation(a proper subset of the diffeomorphism group) that aligns them.The key idea is that isometries are a subspace of the conformalmaps, which can be explored efficiently with Möbius transforma-tions. The main limitation of these approaches is that they do nothandle missing parts or elastic deformations that deviate signifi-cantly from isometry. Furthermore, although these methods canestablish dense correspondences between surfaces, they are notable to provide an optimal deformation from one surface to anoth-er (geodesic), which is necessary for subsequent analysis such ascomputing statistics.

4. Reeb graphs: there is also a significant amount of work in analyzing3D shapes using topological theory [57,110,4]. The central ideahere is to define a certain smooth function, called a Morse function,on the surface of the objects and use its critical points to definenodes of a discrete graph, called a Reeb graph. While this Reebgraph captures the broad skeleton of a 3D shape, it does not incorpo-rate any geometrical information about the intermediate homoge-neous regions. Baloch et al. [4] combined techniques from bothgeometry and topology to provide more complete descriptions ofshapes of 3D objects.

5. Manifolds in activity analysis

In vision-based human activity analysis, the goal is to identifywhat is happening in a scene given a video of the event. Johansson[61] showed that humans are able to recognize simple activitiesfrom moving light displays, which are attached to some key locationson the human body. This study provided impetus to the field ofhuman activity analysis, and showed that it is possible to recognizehuman activities from shapes and their manner of temporal variation.To formally define these notions of shape variation, one needs defini-tions of shape-spaces and methods to model and compare sequenceson shape spaces.

As discussed in earlier sections, there has been substantial work inrepresentations for static shapes. Thus, many of the shape representa-tions discussed so far are naturally applicable to action recognitionapplications. However, additional modeling is needed for recognitionapplications due to the need to model time-series of shapes instead of

one static shape. In typical activity recognition applications, wherebackground subtraction can be reasonably performed, shape informa-tion in the form of the silhouette of the person is extracted. Fig. 14illustrates some examples of how the action can be perceived fromshape sequences.

Manifold constraints on shapes have been utilized in tracking ap-plications such as [6,125]. Furthermore, manifold learning techniques,such as LLE, Isomaps, Laplacian eigenmaps, have found applicationsin human activity analyses [39,140]. In this paper, we shall primarilyrestrict our discussion to only those approaches that involvedifferential-geometric tools for recognition of human activities. Thus,we do not discuss tracking approaches and manifold learning applica-tions. We domention that the problem of estimating shape sequencesfrom noisy observations of shapes at discrete times is itself a problemof great interest; this problemwas studied using smoothing splines onshape manifolds by Su et al. [121].

First, we shall discuss how a class of common parametric modelscan be extended to model shape sequences. Then, we discuss hownon-parametric sequence comparison can be extended to manifolds.Next, we discuss specific examples of general spatio-temporal models –linear dynamical models and tensors – which are applicable to a broadclass of features other than just shapes, and discuss their associatedmanifold interpretation.

5.1. Parametric shape sequences for activity analysis

Parametric models compactly characterize time-sequences with asmall number of parameters. Typical examples of parametric modelsinclude multivariate autoregressive processes, and hidden Markovmodels. In [132], Kendall's shape representation was used to modelgroup activities in an airport surveillance setting. In this application,the shape formed by a group of humans embarking and disembarkingfrom an aircraft was used to detect anomalous activities. Since stan-dard auto-regressive models are defined for vector-spaces, oneneeds to project the shapes onto a tangent-space with minimal dis-tortions, before these techniques can be applied. A common choicefor this tangent-space projection is the tangent-space at the ‘mean’shape. This mean shape is usually computed by an intrinsic meancomputation algorithm such as the Karcher mean algorithm.

Although landmark-based shape space methods have proven suc-cessful in several vision problems, their performance usually relies onaccurate detection and tracking of key landmark points. This turnsout to be a challenging task especially under rapid motion and self-occlusion, which are common in gesture recognition problems. For thisreason, [1] proposed the use of the shape of the contour at each frameusing a closed curve representation. To compare shape sequences, theyuse both a non-parametric DTW-based approach as well as a hiddenMarkov model (HMM). Similarly, Sheng et al. [144] study the activitysequence as a stochastic process on a shape manifold defined earlier in[72].

A standard HMM is a statistical generative model [102] that repre-sents the time series of observations in the form of a set of transitions be-tween several abstract hidden states, where the state at time t is denotedby St. The transition between these states is governed by a Markovianmodel parameterized by the state transition probability P(St|St−1).

Fig. 13. Examples of geodesic paths in shape spaces of closed surfaces.

Fig. 12. Representation of a facial surface using level curves of the surface distancesfrom a reference point (tip of the nose).

409A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 14: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

The observation at each time instant qt is statistically dependent onthe state St according to an observation probability density functionf(qt|St). A common choice for this density function is a mixture ofGaussians. The model parameters are learnt using a generalized ex-pectation–maximization (GEM) technique known as the Baum–

Welch algorithm. This ensures that the state transition and observa-tion density functions are optimized in the maximum likelihoodsense.

The HMM learning and decoding problems have been studied fortime-series data evolving on Euclidean spaces. However, several chal-lenges appear if we want to generalize these methods to time-seriesonmanifolds. One of these challenges is that in order to solve for the op-timal parameters in the maximum likelihood sense we need to providean analytical form for the observation density functions and computethe gradient of the likelihood of a sequence in terms of the parametersof these functions. In [1], an approximate strategy is devised by decou-pling the problems of learning the state transitions and the observationdistributions. This is accomplished by introducing an intermediate layerof observations Xt in the form illustrated in Fig. 15.

Learning the model for each gesture includes learning the statetransition probabilities P(St|St−1), the exemplar cluster observationprobabilities P(xt|St), and the shape observation density functionsgiven to each cluster f(qt|xt=xk). The decoupling of estimating statesand observation densities is achieved as follows. All available shapesare first clustered and a non-parametric statistical model for the

data points is estimated in each cluster to represent the observationdensity function. The state transition and cluster observation proba-bilities are then learnt from the training sample cluster assignmentsusing the standard forward–backward technique as in the standarddiscrete observation HMM.

5.2. Non-parametric shape sequence analysis

As discussed above, parametric approaches often require makingcertain simplifying assumptions about the dynamical nature of theshape sequence. Non-parametric approaches aim to provide a metricon sequences without making assumptions about the dynamics of theaction. To compare two shape-sequences, dynamic time warping(DTW) is a common approach. In DTW, given two sequences A(t) andB(t), a non-linear time normalization between a template vectorsequence and a test vector sequence is computed, such that the twosequences are brought into temporal alignment. Thus,

DTW A tð Þ;B tð Þð Þ ¼ 1=Tð ÞXTt¼1

d A f tð Þð Þ;B g tð Þð Þð Þ ð7Þ

where f and g are the optimal warping functions, that bring the se-quences into temporal alignment. Such a distance between sequencesis symmetric. This approach can be readily extended to shape-sequence matching as follows.

We need to assume that a proper shape space has been chosen basedon application constraints. During the DTWcomputation, we need to usethe geodesic distance intrinsic to the underlying shape manifold in thelocal distance measure. This approach was used in [133] using Kendall'sshape representation for modeling human gait patterns.

In the context of activity recognition, different instances of the sameactivity may consist of varying relative speeds at which the actions areexecuted, in addition to other intra- and inter-person variabilities.Thus, it is important for algorithms for activity recognition to compen-sate for intra- and inter-personal variations of the same activity. Resultson gait-based person identification shown in [12] indicate that it is veryimportant to take into account the temporal variations in the person'sgait. In [135], statistics of typical temporal warps for actions are usedto further enhance activity templates. The basic idea is that the spaceof temporal warps is constrained for each action class, and a statistical

(a) (b)

Fig. 14. (a) Graphical illustration of the sequence of shapes obtained during a walking cycle, (b) shape of the human silhouette and its temporal variation can often convey infor-mation about the action performed.Figs. taken courtesy of [134,1].

Fig. 15. A graphical model of unfolding the exemplar-based HMM used in modeling thegesture dynamics in [1].

410 A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 15: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

characterization can be derived by formulating time-warps as pointson a Hilbert sphere, as summarized below.

Assume that for each frame of the video, an appropriate featurehas been extracted and that the video data has now been convertedinto a feature sequence given by f1, f2,…, for frames ?1,2,.., respective-ly. In the following, F is used to denote the feature space associatedwith the chosen feature. Let γ be a diffeomorphism (a diffeomorph-ism is a smooth, invertible function with a smooth inverse) from[0,1] to itself with γ(0)=0 and γ(1)=1. Also, let Γ be the set of allsuch functions. Elements of Γ are used to denote time warping func-tions. The model for an activity now consists of an average activity se-quence given by a : 0;1½ �→F , a parameterized trajectory on thefeature space. Any time-warped realization of this activity is thenobtained using: r(t)=a(γ(t)), γ∈Γ, which actually defines an actionof Γ on F 0;1½ �, the space of all continuous activities. In this model, thevariability associated with γ in each class is viewed as a distributionPγ on Γ. By rewriting time-warping functions using SRVFs, i.e.ψ ¼ þ

ffiffiffiffi_γ

p, a spherical manifold interpretation is provided. Let the

space of all square-root density forms be given by

Ψ ¼ ψ : 0;1½ �→Rf jψ≥ 0;∫10ψ

2 tð Þdt ¼ 1g: ð8Þ

This is the positive orthant of a unit hypersphere in the Hilbertspace of all square-integrable functions on [0,1]. Since Ψ is a sphere,its geometry is well known. Using its geometry, algorithms for com-puting sample statistics, probability density functions, and generatinginferences are developed in [135]. To learn more about the use ofSRVFs to view time-wraps as elements of this unit Hilbert sphere,please refer to [115].

5.3. Linear dynamical models and the Grassmann manifold for humanactivity analysis

For a wide variety of time-varying features, not necessarilyshape, a common approach to describe the temporal evolution/mo-tion is by means of parametric state-space models. One such modelis the linear dynamical system (LDS). A wide variety of high-dimensional time series data such as dynamic textures, humanjoint angle trajectories, shape sequences, video-based face recogni-tion, etc., have been analyzed using such dynamical models[114,11,134,2]. Given training data, the model parameters are firstestimated, and during testing one measures the similarity betweenthe model parameters of the training and test instances. It can beshown in [127,25] that the study of these models can be formulatedas a study of the geometry of the Grassmann manifold. The lineardynamical model is given by

f tð Þ ¼ Cz tð Þ þw tð Þw tð Þ eN 0;Rð Þ ð9Þ

z t þ 1ð Þ ¼ Az tð Þ þ v tð Þv tð Þ eN 0;Qð Þ ð10Þ

where, z is the hidden state vector, A the transition matrix and C themeasurement matrix. f represents the observed features whilew and v are noise components modeled as normal with 0 meanand covariance R and Q respectively. Starting from an initial condi-tion z(0), it can be shown that the expected observation sequence isgiven by

E

f 0ð Þf 1ð Þf 2ð Þ::

266664377775 ¼

CCACA2

::

266664377775z 0ð Þ ¼ O∞ Mð Þz 0ð Þ: ð11Þ

Thus, the expected observation sequence generated by a time-invariant model (A,C) lies in the column space of the extended

observability matrix given by O∞=[CT, (CA)T, (CA2)T,…]T. However,motivated by the fact that human actions are of a finite-duration intime and not infinitely extending in time, the study of the modelcan be simplified by considering only an n-length expected observa-tion sequence instead of the infinite sequence as above. The tn-length expected observation sequence generated by the model (A,C)lies in the column space of the finite observability matrix given byOnT=[CT, (CA)T, (CA2)T,…(CAn−1)T]. Thus, a dynamical model can be

identified by a point on the Grassmann manifold, corresponding tothe subspace spanned by the columns of the finite observability ma-trix. Using the geometry of the manifold, one can devise nearestneighbor classifiers, parametric and non-parametric density modelsfor classification. It was shown that statistical modeling of class condi-tional densities for each activity using parametric and non-parametricmethods on the Grassmann manifold results in robust performance[127].

Identifying visually salient aspects of long videos is an importantapplication of human activity analysis. In activity based summariza-tion, the goal is to identify typical activities in a given domain of inter-est or in a given video. To perform summarization of long videos, onecan take recourse to clustering actions in the video into dominantmodes. The clusters would then serve as a summary of the video.For this, one would first need to segment the video into coherent ac-tions, and perform clustering on the obtained segments. However,since each segment can be of differing lengths, and possibly corruptedby noise, it is desirable to compactly parameterize each segment by asmall set of parameters. For this parameterization, the linear dynamicmodel can be used. Using the geometry of the Grassmann manifoldk-means clustering of video-segments can be performed as describedin [127]. Some sample sequences in the obtained clusters are shownin Fig. 16. The shown clusters correspond dominantly to ‘SittingSpins’, and ‘Standing Spins’.

Recent work has also focused on enabling fast search and retriev-al of human actions based on dynamical models. By exploiting theGrassmann manifold formulation of dynamical models, [24]designed approximate search strategies using spectral hashing.Search strategies such as locality sensitive hashing and spectralhashing are inherently designed for vector spaces. In [24], a generalmethodology that exploits the geometry of the Grassmann manifoldwas proposed by first clustering data into several clusters. Each clus-ter is then approximated by a tangent-space projection at thecluster-mean. On each of these tangent-spaces, a spectral hash func-tion is defined which hashes data locally within the cluster. For re-trieval, a test point is first compared to all of the cluster centers,and the nearest center is found. Once the nearest center is found,the local spectral hash table is used to quickly identify the closestvideo in terms of the manifold geodesic distance. This enables fastapproximate search for actions in large databases in a computation-ally efficient manner.

5.4. Modeling activities as tensors: products of Grassmann manifolds

Tensors are generalizations of matrices to multiple dimensions. A3D space-time volume can naturally be considered as a tensorwith three independent dimensions. By decomposing a multi-dimensional data tensor into dominant modes (as a generalizationof principal component analysis), one can extract signatures usefulfor action recognition. Using this approach, Lui et al. [86], considereda tensor as a collection of subspaces. Each subspace is estimated byflattening the 3D tensor along one of the three directions, and thenobtaining a low rank approximation to the flattened tensor via SVD.Thus, an (x,y, t) volume gives rise to three subspaces. Intuitively,these subspaces capture overall spatial appearance variations, andmotion variations along the horizontal and vertical directions. Eachof the subspaces is identified as a point on the Grassmann manifold,and the tensor is viewed as a triplet of subspaces. This triplet is

411A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 16: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

naturally considered a point on a product of Grassmann manifolds.Geodesics on product manifolds can be computed simply by individ-ual component geodesics. This approach is shown to outperformother tensor based approaches such as canonical correlations andspace-time interest point based approaches on the challenging KTHdataset.

5.5. Further applications in image and video analysis

General Riemannianmanifolds have foundmanymore important ap-plications in the vision community. Here, we briefly touch upon thebreadth of recognition applications that have been strongly impactedby manifold models. A recently developed formulation of using the co-variance of features in image-patches has found several applicationssuch as texture classification [128], pedestriandetection [129], and track-ing [101]. The Riemannian geometry of covariance matrices wasexploited effectively in all these applications to design state-of-the-art al-gorithms. The Grassmannmanifold structure of the affine shape space isalso exploited in [8] to perform affine invariant clustering of shapes. [56]performs discriminative classification over subspaces for object recogni-tion tasks by using Mercer kernels on the Grassmann manifold. In [85],a face image and its perturbations due to registration errors are

approximated as a linear subspace, hence are embedded as points on aGrassmann manifold. The manifold of scale-normalized visualdescriptors is studied in [80] in order to establish geometric multi-component distributions, which is then applied toward group activityrecognition.

6. Summary and discussions

In this paper we have reviewed recent progress in the area ofshape analysis and activity recognition, with a special focus on tech-niques that use tools from differential geometry. For shape analysis,we summarize several techniques used in both 2D and 3D shapeanalyses using a wide variety of mathematical representations andRiemannian metrics. These representations range from landmark-based or point distribution models to parameterized curves and sur-faces. The main challenges to a formalization of a shape space andcomputation of shape metrics come from the desired invariance tocertain shape-preserving transformations. In addition to rigid mo-tions and global scaling, shape analyses of curves and surfaces alsorequire invariance to re-parameterizations. This has been accom-plished rather efficiently using elastic Riemannian metrics. In activ-ity analysis, we have reviewed techniques that use formal models of

(a) Cluster 1: Sit-spins

(b) Cluster 2: Stand-spins

(c) Cluster 3: Camel-spins

Fig. 16. Dynamical model based clustering on the Grassmann manifold. Shown here are a few sequences from each obtained cluster from a long skating video. Each row in a clustershows contiguous frames of a sequence.Figures taken from [127].

412 A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 17: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

shape and motion to describe human actions. Often, the modelsfor shape and motion have an associated differential geometric in-terpretation. We have discussed how the differential geometrictools are used in a wide variety of applications including gait-based human identification, action recognition, video summariza-tion and indexing.

Acknowledgment

This research was supported in part by an ONR grant ONRN00014‐09‐10664. The authors would like to thank Prof. Rama Chel-lappa for his encouragement and support in writing this paper. Theyalso thank the reviewers for their help in improving this paper.

Appendix A. Basic differential geometry

We start by considering the definition of a general differentiable manifold. A topological spaceM is called a differentiable manifold if, amongstother properties, it is locally Euclidean. This means that for each point in M, there exists a neighborhood and a mapping that flattens that neigh-borhood. The Euclidean spaceRn is an n-dimensional differentiable manifold which can be covered by a single neighborhood. Any open subset ofa differentiable manifold is itself a differentiable manifold. GL(n), the set of all n×n non-singular matrices, i.e. GL(n)={A∈M(n)|det(A)≠0}, isan open subset of Rn�n. Thus, it is also a differentiable manifold.

In order to perform differential calculus, i.e. to compute gradients, directional derivatives, critical points, etc., of functions on manifolds, oneneeds to understand the tangent structure of those manifolds. Although there are several ways to define tangent spaces, one intuitive approachis to consider differentiable curves on the manifold passing through the point of interest, and to study the velocity vectors of these curves at thatpoint. The set of all such tangent vectors is called the tangent space to M at p. Even though the manifold M may be nonlinear, the tangent spaceTp(M) is always linear and one can impose probability models on it using more traditional approaches. In case of the Euclidean space Rn, thetangent space Tp Rnð Þ ¼ Rn for all p∈Rn. For any A∈GL(n), let γ(t) be a path in GL(n) passing through A∈GL(n) at t=0. The velocity vector atp, _γ 0ð Þ, is n×n matrices. Hence, TA GL nð Þð Þ ¼ Rn�n.

Next we introduce the notion of a submanifold. The differential of a smooth mapping f :M→N at p∈M, denoted by dfp, is a linear map dfp :Tp(M)→Tf(p)(N) given by, for any v∈Tp(M), (dfp(v))(g)=v(fBg)(p). A point p∈M is defined to be a regular point if dfp is onto, and its imageunder f is said to be a regular value. Suppose M and N are manifolds of dimensions m and n respectively, and let f :M→N be a smooth map,with a regular value y∈N. Then f−1(y) is a submanifold of M of dimension m−n. Using this result, we can check if Sn is indeed a submanifoldof Rnþ1. We now consider the set O(n) of orthogonal matrix, which is a subset of the manifold GL(n). We define O(n) to be the set of all n×ninvertible matrices O that satisfy OOT= I. Define S(n) to be the set of n×n symmetric matrices, and then define f :GL(n)→S(n) by f(O)=OOT.It can easily be shown that I is a regular value of f and, hence, f−1(I)=O(n) is a submanifold of GL(n). Note that O(n) is not connected, buthas two components: those orthogonal matrices with determinant +1, and those with determinant −1. The set of orthogonal matrices withdeterminant 1 is called the special orthogonal group, and denoted by SO(n).

We now consider the task of measuring distances on a manifold. This is accomplished using a Riemannian metric defined as follows.

Definition 2. A Riemannian metric on a differentiable manifold M is a map ⟨⋅, ⋅ ⟩ that smoothly associates to each point p∈M a symmetric,bilinear, positive definite form on the tangent space Tp(M).

A differentiable manifold with a Riemannian metric on it is called a Riemannian manifold. As an example, Rn is a Riemannian manifold with theRiemannian metric ⟨v1,v2⟩=v1

Tv2, the standard Euclidean product. Similarly, for the unit sphere Sn and a point p∈Sn, the Euclidean inner producton the tangent vectors make Sn a Riemannian manifold. That is, for any v1; v2∈Tp Snð Þ, we used the Riemannian metric ⟨v1,v2⟩=v1

Tv2. Using theRiemannian structure, it becomes possible to define lengths of paths on amanifold. Letα:[0,1]↦M be a parameterized path on a RiemannianmanifoldM that is differentiable everywhere on [0,1]. Then dα

dt , the velocity vector at t, is an element of the tangent space Tα(t)(M), with length given byffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidα tð Þdt ; dα tð Þ

dt

D Er. The length of the path α is then given by:

L α½ � ¼ ∫10

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidα tð Þdt

;dα tð Þdt

� � �sdt: ðA:1Þ

For any two points p,q∈M, one can define the distance between them as the infimum of the lengths of all smooth paths onMwhich start at pand end at q:

d p; qð Þ ¼ infα: 0;1½ �↦Mf jα 0ð Þ¼p;α 1ð Þ¼qg

L α½ �: ðA:2Þ

A path α̂ which achieves the above minimum, if it exists, is a geodesic between p and q onM. Geodesics on unit sphere Sn are great circles. Thedistance minimizing geodesic between two points p and q is the shorter of the two arcs of a great circle joining them. As a parameterized curve,this geodesic is given by:

α tð Þ ¼ 1sin θð Þ sin θ−tð Þpþ sin tð Þq½ � ð14Þ

where θ=cos−1(⟨p,q⟩).An important tool in studying statistics on a manifold is an exponential map. If M is a Riemannian manifold and p∈M, the exponential map

expp :Tp(M)→M, is defined by expp(v)=αv(1) where αv is a constant speed geodesic whose velocity vector at p is v. ForRn, under the Euclideanmetric, since geodesics are given by straight lines, the exponential map is a simple addition: expp(v)=p+v, forp; v∈Rn. The exponential map ona sphere, exp : Tp Snð Þ↦Sn, is given by expp vð Þ ¼ cos ‖v‖ð Þpþ sin ‖v‖ð Þ v

‖v‖.

413A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 18: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

A group G is a Lie group, if: (i) it is a smooth manifold, and (ii) the group operations G×G→G defined by (g,h)→gh and G→G defined byg→g−1 are both smooth mappings. Now we take the case of a manifold M and study how the points change on M when operated on bysome kind of a transformation group. Our motivation, of course, is shape analysis where M will be a manifold formed by curves and surfacesand we want to study variations of their shapes under different transformations — rotations, translations, and scalings. Mathematically, thisis managed using actions of groups on appropriate manifolds. Given a manifold M and a Lie group G, a left group action of G on M is a mapG×M→M, written as (g,p)↦g*p, such that

(1) g1*(g2*p)=(g1 ⋅g2)*p, ∀g1,g2∈G and p∈M;(2) e*p=p, ∀p∈M.

In item 1, the symbol ⋅ denotes the group operation in G. Another way to phrase this relation is to say that G acts on M. We say that G actssmoothly on M if the map G×M→M is a smooth map. A group action of G on a Riemannian manifold M is called isometric if it preserves theRiemannian metric on M. In other words, for all g∈G, the map M→M given by p↦g*p is an isometry. For the same situation, we sometimessay that G acts on M by isometries. It then also follows that for all g∈G and x,y∈M, d(x,y)=d(g*x,g*y), where d(x,y) is the distance functionon M resulting from the Riemannian metric.

References

[1] M.F. Abdelkader, W. Abd-Almageed, A. Srivastava, R. Chellappa, Silhouette-based gesture and action recognition via modeling trajectories on Riemannianshape manifolds, Comput. Vision Image Underst. 115 (March 2011) 439–455.

[2] G. Aggarwal, A. Roy-Chowdhury, R. Chellappa, A system identification approach forvideo-based face recognition, International Conference on Pattern Recognition,2004.

[3] Y. Amit, U. Grenander, M. Piccioni, Structural image restoration through deform-able templates, J. Am. Stat. Assoc. 86 (414) (1991) 376–387.

[4] S. Baloch, H. Krim, Object recognition through topo-geometric shape modelsusing error-tolerant subgraph isomorphisms, IEEE Trans. Image Process. 19(May 2010) 1191–1200.

[5] M. Bauer, M. Bruveris, A new Riemannian setting for surface registration, in:X. Pennec, S. Joshi, M. Nielsen (Eds.), Proceedings of the Third InternationalWorkshop on Mathematical Foundations of Computational Anatomy —

Geometrical and Statistical Methods for Modelling Biological Shape Variability,2011, pp. 182–193.

[6] A.M. Baumberg, D.C. Hogg, An efficient method for contour tracking using activeshapemodels, Proceeding of theWorkshop onMotion of Nonrigid and ArticulatedObjects, 1994, pp. 194–199.

[7] M. Beg, M.I. Miller, A. Trouvé, L. Younes, Computing large deformation metricmappings via geodesic flows of diffeomorphisms, Int. J. Comput. Vision 61(2005) 139–157.

[8] E. Begelfor, M. Werman, Affine invariance revisited, IEEE International Conferenceon Computer Vision and Pattern Recognition, 2006, pp. 2087–2094.

[9] P.N. Belhumeur, J.P. Hepanha, D.J. Kriegman, Eigenfaces vs. fisherfaces: recogni-tion using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell.19 (7) (1997) 711–720.

[10] P.J. Besl, N.D. McKay, A method for registration of 3-D shapes, IEEE TPAMI 14 (2)(1992) 239–256.

[11] A. Bissacco, A. Chiuso, Y. Ma, S. Soatto, Recognition of human gaits, IEEE Interna-tional Conference on Computer Vision and Pattern Recognition, 2, 2001,pp. 52–57.

[12] A. Bobick, Tanawongsuwan, Performance analysis of time-distance gait parametersunder different speeds, Audio- and Video-based Biometric Person Authentication(AVBPA), June 2003.

[13] S. Bouix, J.C. Pruessner, D.L. Collins, K. Siddiqi, Hippocampal shape analysis usingmedial surfaces, Neuroimage 25 (2001) 1077–1089.

[14] C. Brechbühler, G. Gerig, O. Kübler, Parameterization of closed surfaces for 3Dshape description, Comput. Vision Image Underst. 61 (2) (1995) 154–170.

[15] R.W. Brockett, System theory on group manifolds and coset spaces, SIAM J.Control 10 (2) (May 1972) 265–284.

[16] A. Bronstein, M. Bronstein, R. Kimmel, Calculus of nonrigid surfaces for geome-try and texture manipulation, IEEE Trans. Vis. Comput. Graph. 13 (2007)902–913.

[17] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Three-dimensional face recognition,Int. J. Comput. Vision 64 (1) (2005) 5–30.

[18] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Efficient computation of isometry-invariant distances between surfaces, SIAM J. Sci. Comput. 28 (September,2006) 1812–1836.

[19] V. Caselles, R. Kimmel, G. Sapiro, Geodesic active contours, Int. J. Comput. Vision22 (1) (1997) 61–79.

[20] J. Cates, P.T. Fletcher, M. Styner, H. Hazlett, R. Whitaker, Particle-based shapeanalysis of multi-object complexes, Proceedings of the 11th InternationalConference on Medical Image Computing and Computer Assisted Intervention(MICCAI), 2008.

[21] J. Cates, M. Meyer, P.T. Fletcher, R. Whitaker, Entropy-based particle systems forshape correspondence, Proc. of MICCAI Workshop Mathematical Foundations ofComputational Anatomy, 2006, pp. 90–99.

[22] W. Chang, M. Zwicker, Automatic registration for articulated shapes, Comput.Graph. Forum (Proc. SGP 2008) 27 (5) (2008) 1459–1468.

[23] G. Charpiat, O. Faugeras, R. Keriven, Approximations of shape metrics and appli-cation to shape warping and empirical shape statistics, J. Found. Comput. Math.5 (1) (2005) 1–58.

[24] R. Chaudhry, Y. Ivanov, Fast approximate nearest neighbor methods for non-Euclidean manifolds with applications to human activity analysis in videos,European Conference on Computer Vision, 2010, pp. 735–748.

[25] R. Chaudhry, A. Ravichandran, G.D. Hager, R. Vidal, Histograms of oriented opticalflow and binet-cauchy kernels on nonlinear dynamical systems for the recognitionof human actions, IEEE International Conference on Computer Vision and PatternRecognition, 2009, pp. 1932–1939.

[26] G.E. Christensen, H.J. Johnson, Consistent image registration, IEEE Trans. Med.Imaging 20 (7) (2001) 568–582.

[27] H. Chui, A. Rangarajan, A new point matching algorithm for non-rigid registra-tion, Comput. Vision Image Underst. 89 (2–3) (2003) 114–141.

[28] T.F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham, Active shape models:their training and application, Comput. Vision Image Underst. 61 (1995)38–59.

[29] D. Cremers, T. Kohlberg, C. Schnorr, Nonlinear shape statistics in Mumford–Shahbased segmentation, European Conference on Computer Vision, Lecture Notesin Computer Science 2350, Springer, Copenhagen, Denmark, June 2002,pp. 93–108.

[30] D. Cremers, S. Soatto, A pseudo distance for shape priors in level set segmenta-tion, 2nd IEEE Workshop on Variational, Geometric, and Level Set Methods inComputer Vision, 2003, France, http://lear.inrialpes.fr/people/triggs/events/iccv03/cdrom/vlsm03/index.htm.

414 A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 19: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

[31] C. Davatzikos, M. Vaillant, S. Resnick, J.L. Prince, S. Letovsky, R.N. Bryan, A com-puterized method for morphological analysis of the corpus callosum, J. Comput.Assist. Tomogr. 20 (1996) 88–97.

[32] R.H. Davies, C.J. Twining, T.F. Cootes, C.J. Taylor, Building 3-D statistical shapemodels by direct optimization, IEEE Trans. Med. Imaging 29 (4) (2010)961–981.

[33] H. Drira, B. Benamor, M. Daoudi, A. Srivastava, A Riemannian analysis of 3D noseshapes for partial human biometrics, Proceedings of International Conference onComputer Vision, 2009.

[34] H. Drira, B. Benamor, M. Daoudi, A. Srivastava, Pose and expression-invariant 3Dface recognition using elastic radial curves, Proceedings of British Machine VisionConference, 2010.

[35] I.L. Dryden, K.V. Mardia, Statistical Shape Analysis, John Wiley & Son, 1998.[36] T.E. Duncan, Some filtering results in Riemannian manifolds, Inf. Control 35 (3)

(November 1977) 182–195.[37] T.E. Duncan, Stochastic systems in Riemannian manifolds, J. Optim. Theory Appl.

27 (3) (March 1979) 399–426.[38] T.E. Duncan, An estimation problem in compact lie groups, Syst. Control Lett. 10

(4) (April 1990) 257–263.[39] A.M. Elgammal, C.-S. Lee, Nonlinear manifold learning for dynamic shape and

dynamic appearance, Comput. Vision Image Underst. 106 (1) (2007) 31–46.[40] P. Fletcher, S. Joshi, Riemannian geometry for the statistical analysis of diffusion

tensor data, Signal Process. 87 (2) (2007) 250–262.[41] P.T. Fletcher. Statistical Variability in nonlinear spaces: application to shape

analysis and DT-MRI. Ph.D. Thesis, Department of Computer Science, Universityof North Carolina, August 2004.

[42] P.T. Fletcher, S. Joshi, C. Lu, S.M. Pizer, Principal geodesic analysis for the study ofnonlinear statistics of shape, IEEE Trans. Med. Imaging 23 (8) (Aug. 2004)995–1005.

[43] M.S. Floater, K. Hormann, Surface parameterization: a tutorial and survey, in:N.A. Dodgson, M.S. Floater, M.A. Sabin (Eds.), Advances in Multiresolution forGeometric Modelling, Springer Verlag, 2005, pp. 157–186.

[44] G. Gerig, M. Styner, M.E. Shenton, J.A. Lieberman, Shape versus size: improvedunderstanding of the morphology of brain structures, Proc. of MICCAI, 2001,pp. 24–32.

[45] J. Gomes, O. Faugeras, Reconciling distance functions and level sets, BiomedicalImaging, 2002, 5th IEEE EMBS International Summer School on, june 2002,page 15 pp.

[46] K. Gorczowski, M. Styner, J.Y. Jeong, J.S. Marron, J. Piven, H.C. Hazlett, S.M. Pizer,G. Gerig, Multi-object analysis of volume, pose, and shape using statistical dis-crimination, IEEE TPAMI 32 (4) (2010) 652–666.

[47] U. Grenander. Pattern Synthesis: Lectures in Pattern Theory, vol. I, II. Springer-Verlag, New York, 1976, 1978.

[48] U. Grenander, Regular Structures: Lectures in Pattern Theory, vol. III, Springer-Verlag, New York, 1981.

[49] U. Grenander, General Pattern Theory, Oxford University Press, 1993.[50] U. Grenander, Y. Chow, D. Keenan, HANDS: A Pattern Theoretic Study of Biological

Shapes, Springer-Verlag, 1990.[51] U. Grenander, M.I. Miller, Representations of knowledge in complex systems,

J. R. Stat. Soc. 56 (3) (1994).[52] U. Grenander, M.I. Miller, Computational anatomy: an emerging discipline, Q.

Appl. Math. LVI (4) (1998) 617–694.[53] U. Grenander, M.I. Miller, A. Srivastava, Hilbert–Schmidt lower bounds for esti-

mators on matrix Lie groups for ATR, IEEE Trans. PAMI 20 (8) (1998) 790–802.[54] B. Benamor, H. Drira, M. Daoudi, A. Srivastava, Nasal region contribution in 3D

face biometrics using shape analysis framework, Proceedings of InternationalConference on Computer Vision, 2009, pp. 357–366.

[55] P.W. Hallinan, G. Gordon, A.L. Yuille, P. Giblin, D. Mumford, Two- and Three-dimensional Patterns of Face, A. K. Peters, 1999.

[56] J. Hamm, D.D. Lee, Grassmann discriminant analysis: a unifying view onsubspace-based learning, International Conference on Machine Learning, Omni-press, Helsinki, Finland, June 2008, pp. 376–383, http://shop.omnipress.com/icmlproceedings.aspx.

[57] M. Hilaga, S.T. Kohmura, T.L. Kunii, Topology matching for fully automatic simi-larity estimation of 3D shapes, ACM SIGGRAPH, 2001, pp. 203–212.

[58] Q. Huang, B. Adams, M.Wicke, L.J. Guibas, Non-rigid registration under isometricdeformations, Proc. of the Symposium on Geometry Processing, SGP'08, , 2008,pp. 1449–1457.

[59] A.K. Jain, Y. Zhong, S. Lakshmanan, Object matching using deformable templates,IEEE Trans. Pattern Anal. Mach. Intell. 18 (3) (1996) 267–278.

[60] B. Jian, B.C. Vemuri, A robust algorithm for point set registration using mixture ofGaussians, IEEE Int. Conf. Comput. Vision 2 (2005) 1246–1251.

[61] G. Johansson, Visual perception of biological motion and a model for its analysis,Percept. Psychophys. 14 (2) (1973) 201–211.

[62] S.C. Joshi, M.I. Miller, U. Grenander, On the geometry and shape of brain sub-manifolds, Pattern Recognit Artif. Intell. 11 (1997) 1317–1343.

[63] S.H. Joshi, E. Klassen, A. Srivastava, I. Jermyn, Removing shape-preserving trans-formations in square-root elastic (SRE) framework for shape analysis of curves,Proc. of 6th Intl. Conf. on EMMCVPR, Hubei, China, 2007, pp. 387–398.

[64] S.H. Joshi, E. Klassen, A. Srivastava, I.H. Jermyn, A novel representation for rie-mannian analysis of elastic curves in ℝn, Proceedings of IEEE CVPR, 2007,pp. 1–7.

[65] K. Kanatani, Group-theoretical Methods in Image Understanding, Springer-Verlag,1990.

[66] A. Kelemen, G. Szekely, G. Gerig, Elastic model-based segmentation of 3D neuro-logical data sets, IEEE Trans. Med. Imaging 18 (10) (1999) 828–839.

[67] D.G. Kendall, Shape manifolds, procrustean metrics and complex projectivespaces, Bull. London Math. Soc. 16 (1984) 81–121.

[68] D.G. Kendall, D. Barden, T.K. Carne, H. Le, Shape and Shape Theory, Wiley, 1999.[69] M. Kilian, N.J. Mitra, H. Pottmann, Geometric modeling in shape space, Proc. of

SIGGRAPH, 2007.[70] V.G. Kim, Y. Lipman, T. Funkhouser, Blended intrinsic maps, ACM Trans. Graph.

30 (2011) 79:1–79:12.[71] E. Klassen, A. Srivastava, Geodesics between 3D closed curves using path-

straightening, Proceedings of ECCV, Lecture Notes in Computer Science, 2006,pp. 95–106, pages I.

[72] E. Klassen, A. Srivastava, W. Mio, S.H. Joshi, Analysis of planar shapes using geo-desic paths on shape spaces, IEEE Trans. Pattern Anal. Mach. Intell. 26 (3) (2004)372–383.

[73] S. Kurtek, E. Klassen, Z. Ding, M. Avison, A. Srivastava, Parameterization-invari-ant shape statistics and probabilistic classification of anatomical surfaces, in:G. Székely, H.K. Hahn (Eds.), IPMI, Lecture Notes in Computer Science, volume6801, Springer, 2011.

[74] S. Kurtek, E. Klassen, Z. Ding, S. Jacobson, J.L. Jacobson, M.J. Avison, A. Srivastava,Parameterization-invariant shape comparisons of anatomical surfaces, IEEETrans. Med. Imaging 30 (3) (March 2011) 849–858.

[75] S. Kurtek, E. Klassen, Z. Ding, A. Srivastava, A novel Riemannian framework forshape analysis of 3D objects, IEEE Comput. Soc. Conf. Comput. Vision PatternRecognit. (2010) 1625–1632.

[76] S. Kurtek, E. Klassen, J.C. Gore, Z. Ding, and A. Srivastava., accepted for publica-tion. Elastic geodesic paths in shape space of parametrized surfaces. IEEETrans. Pattern Anal. Mach. Intell., DOI:10.1109/TPAMI.2011.233.

[77] S. Lafon, A. Lee, Diffusion maps and coarse-graining: a unified framework for di-mensionality reduction, graph partitioning, and data set parameterization, IEEETrans. Pattern Anal. Mach. Intell. 28 (2006) 1393–1403.

[78] H. Le, D.G. Kendall, The Riemannian structure of Euclidean shape spaces: a novelenvironment for statistics, Ann. Stat. 21 (3) (1993) 1225–1271.

[79] N.E. Leonard, P.S. Krishnaprasad, Motion control of drift-free left-invariant systemson lie groups, IEEE Trans. Autom. Control 40 (9) (September 1995) 1539–1554.

[80] R. Li, R. Chellappa, S. Zhou, Learning multi-modal densities on discriminativetemporal interaction manifold for group activity recognition, IEEE InternationalConference on Computer Vision and Pattern Recognition, 2009.

[81] Y. Lipman, T. Funkhouser, Mobius voting for surface correspondence, ACMTrans. Graph. 28 (July 2009) 72:1–72:12.

[82] Y. Lipman, R. Rustamov, T. Funkhouser, Biharmonic distance, ACM Trans. Graph.29 (3) (June 2010).

[83] X. Liu, A. Srivastava, K. Gallivan, Optimal linear representations of images for ob-ject recognition, IEEE Trans. Pattern Anal. Mach. Intell. 26 (5) (2004) 662–666.

[84] S. Loncaric, A survey of shape analysis techniques, Pattern Recognit. 31 (8)(1998) 983–1001.

[85] Y.M. Lui, J.R. Beveridge, Grassmann registration manifolds for face recognition,European Conference on Computer Vision, Lecture Notes in Computer Science5304, Springer, Marseille, France, October 2008, pp. 44–57.

[86] Y.M. Lui, J.R. Beveridge, M. Kirby, Action classification on product manifolds,IEEE International Conference on Computer Vision and Pattern Recognition,2010, pp. 833–839.

[87] R. Malladi, J.A. Sethian, B.C. Vemuri, A fast level set based algorithm for topology-independent shape modeling, J. Math. Imag. Vision 6 (1995) 269–290.

[88] R. Malladi, J.A. Sethian, B.C. Vemuri, Shape modeling with front propagation: alevel set approach, IEEE Trans. Pattern Anal. Mach. Intell. 17 (1995) 158–175.

[89] F. Memoli, G. Sapiro, A theoretical and computational framework for isometryinvariant recognition of point cloud data, Found. Comput. Math. 5 (3) (2005)313–347.

[90] A.C.G. Mennuci, Metrics of Curves in Shape Optimization and Analysis, ScuolaNormale Superiore, Pisa, Italy, 2009.

[91] P.W. Michor, D. Mumford, Riemannian geometries on spaces of plane curves,J. Eur. Math. Soc. 8 (2006) 1–48.

[92] M.I. Miller, G.E. Christensen, Y. Amit, U. Grenander, Mathematical textbook ofdeformable neuroanatomies, Proc. Natl. Acad. Sci. 90 (24) (December 1993).

[93] M.I. Miller, L. Younes, Group actions, homeomorphisms, and matching: a generalframework, Int. J. Comput. Vision 41 (1/2) (2002) 61–84.

[94] W. Mio, A. Srivastava, Elastic-string models for representation and analysis ofplanar shapes, Proc. of Computer Vision and Pattern Recognition, volume 2,2004, pp. 10–15.

[95] W. Mio, A. Srivastava, S.H. Joshi, On shape of plane elastic curves, Int. J. Comput.Vision 73 (3) (2007) 307–324.

[96] M. Moakher, Means and averaging in the group of rotations, SIAM J. Matrix Anal.Appl. 24 (January 2002).

[97] M. Moakher, A differential geometric approach to the geometric mean of sym-metric positive-definite matrices, SIAM J. Matrix Anal. Appl. 26 (3) (2005)735–747.

[98] S. Osher, R. Fedkiw, Level Set Methods and Dynamic Implicit Surfaces, SpringerVerlag, 2003.

[99] S. Osher, N. Paragios, Geometric Level Set Methods in Imaging, Vision, andGraphics, Springer Verlag, 2003.

[100] X. Pennec, P. Fillard, N. Ayache, A Riemannian framework for tensor computing,Int. J. Computer Vision 66 (1) (2006) 41–66.

[101] F. Porikli, O. Tuzel, P. Meer, Covariance tracking using model update based on liealgebra, IEEE International Conference on ComputerVision andPatternRecognition,IEEE Computer Society, New York, USA, June 2006, pp. 728–735.

[102] L.R. Rabiner, A tutorial on hidden Markov models and selected applications inspeech recognition, Proc. IEEE 77 (2) (February 1989) 257–286.

415A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416

Page 20: Author's personal copy - ssamg.stat.fsu.edussamg.stat.fsu.edu/upload/file/?id=3c571c7fe6c852caa2e7ca5bff1c… · Author's personal copy On advances in differential-geometric approaches

Author's personal copy

[103] J.G. Raudseps, Some aspects of the tangent-angle vs arc length representation ofcontours, Rep. 1801-6(ASTIA AD 462 877), Ohio State Univ. Res. Foundation,Columbus, 1965.

[104] C. Samir, A. Srivastava, M. Daoudi, Three-dimensional face recognition usingshapes of facial curves, IEEE Trans. Pattern Anal. Mach. Intell. 28 (11) (2006)1858–1863.

[105] C. Samir, A. Srivastava, M. Daoudi, E. Klassen, An intrinsic framework for analysisof facial surfaces, Int. J. Comput. Vision 82 (1) (April 2009) 80–95.

[106] A. Schwartzman. Random ellipsoids and false discovery rates: statistics for diffu-sion tensor imaging data. PhD thesis, Stanford, 2006.

[107] J. Shah, H0 type Riemannian metrics on the space of planar curves, Q. Appl.Math. 66 (2008) 123–137.

[108] J. Shah, An H2 type Riemannian metric on the space of planar curves, Workshopon the Mathematical Foundations of Computational Anatomy, MICCAI, October2006.

[109] E. Sharon, D. Mumford, 2D-shape analysis using conformal mapping, Int. J. Comput.Vision 70 (1) (October 2006) 55–75.

[110] Y. Shinagawa, T.L. Kunii, Y.L. Kergosien, Surface coding based on Morse theory,IEEE Comput. Graph. Appl. 11 (1991) 67–78.

[111] K. Siddiqi, A. Shokoufandeh, S.J. Dickinson, S.W. Zucker, Shock graphs and shapematching, Int. J. Comput. Vision 35 (1999) 13–32.

[112] K. Siddiqi, J. Zhang, D. Macrini, A. Shokoufandeh, S. Bouix, S. Dickinson, Retrievingarticulated 3D models using medial surfaces, Mach. Vision Appl. 19 (4) (2008)261–274.

[113] C.G. Small, The Statistical Theory of Shape, Springer, 1996.[114] S. Soatto, G. Doretto, Y.N. Wu, Dynamic textures, IEEE International Conference

on Computer Vision, 2, 2001, pp. 439–446.[115] A. Srivastava, I.H. Jermyn, Looking for shapes in 2D cluttered, point clouds, IEEE

Trans. Pattern Recognit. Mach. Intell. 31 (9) (2009) 1616–1629.[116] A. Srivastava, S.H. Joshi, W. Mio, X. Liu, Statistical shape analysis: clustering, learn-

ing and testing, IEEE Trans. Pattern Anal. Mach. Intell. 27 (4) (2005) 590–602.[117] A. Srivastava, E. Klassen, S.H. Joshi, I.H. Jermyn, Shape analysis of elastic curves in

Euclidean spaces, IEEE Transactions on Pattern Analysis and Machine Intelligence33 (7) (July 2011) 1415–1428, http://dx.doi.org/10.1109/TPAMI.2010.184.

[118] A. Srivastava, X. Liu, Tools for application-driven dimension reduction,J. NeuroComput. 67 (2005) 136–160.

[119] A. Srivastava, C. Samir, S.H. Joshi, Daoudi, Elastic shape models for face analysisusing curvilinear coordinates, J. Math. Imaging Vision 33 (2) (February 2009)253–265.

[120] M. Styner, I. Oguz, S. Xu, C. Brechbuhler, D. Pantazis, J. Levitt, M.E. Shenton, G.Gerig, Framework for the statistical shape analysis of brain structures usingSPHARM-PDM, MICCAI Open Science Workshop, 2006.

[121] J. Su, I.L. Dryden, E. Klassen, H. Le, and A. Srivastava, accepted for publication. Fittingsmoothing splines to time-indexed, noisy points on nonlinear manifolds. J. ImageVision Comput., September. http://dx.doi.org/10.1016/j.imavis.2011.09.006.

[122] G. Sundaramoorthi, A. Mennucci, S. Soatto, A. Yezzi, Tracking deforming objectsby filtering and prediction in the space of curves, CDC, December 2009.

[123] H. Tagare, D. Groisser, O. Skrinjar, Symmetric non-rigid registration: a geometrictheory and some numerical techniques, J. Math. Imaging Vision 34 (2009) 6188.

[124] Z. Tari, J. Shah, H. Pien, A computationally efficient shape analysis via level sets,IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, SanFrancisco, 1996.

[125] K. Toyama, A. Blake, Probabilistic tracking in a metric space, IEEE InternationalConference on Computer Vision, 2001, pp. 50–59.

[126] A. Trouvé, Diffeomorphisms groups and pattern matching in image analysis, Intl.J. Comput. Vision 28 (3) (1998) 213–221.

[127] P. Turaga, A. Veeraraghavan, A. Srivastava, R. Chellappa, Statistical computationson Grassmann and Stiefel manifolds for image and video based recognition, In:IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (11),November 2011, pp. 2273–2286.

[128] O. Tuzel, F. Porikli, P. Meer, Region covariance: a fast descriptor for detection andclassification, European Conference on Computer Vision, Lecture Notes inComputer Science 3951, Springer, Graz, Austria, May 2006, pp. 589–600.

[129] O. Tuzel, F. Porikli, P. Meer, Pedestrian detection via classification on Riemannianmanifolds, IEEE Trans. Pattern Anal. Mach. Intell. 30 (10) (October 2008)1713–1727.

[130] M. Vaillant, J. Glaunes, Surface matching via currents, IPMI, 2005, pp. 381–392.

[131] O. van Kaick, H. Zhang, G. Hamarneh, D. Cohen-Or, A survey on shape correspon-dence, Proc. of Eurographics State-of-the-art Report, 2010.

[132] N. Vaswani, A.K. Roy Chowdhury, R. Chellappa, “Shape Activity”: a continuous-state HMM for moving/deforming shapes with application to abnormal activitydetection, IEEE Trans. Image Process. 14 (10) (2005) 1603–1616.

[133] A. Veeraraghavan, R. Chellappa, A.K. Roy-Chowdhury, The Function Space ofan Activity, IEEE International Conference on Computer Vision and PatternRecognition, volume 1, 2006, pp. 959–968.

[134] A. Veeraraghavan, A. Roy-Chowdhury, R. Chellappa, Matching shape sequencesin video with an application to human movement analysis, IEEE Trans. PatternAnal. Mach. Intell. 27 (12) (Dec 2005) 1896–1909.

[135] A. Veeraraghavan, A. Srivastava, A.K. Roy Chowdhury, R. Chellappa, Rate-invariantrecognition of humans and their activities, IEEE Trans. Image Process. 18 (6) (June2009) 1326–1339.

[136] T. Vercautere, X. Pennec, A. Perchant, N. Ayache, Symmetric log-domain diffeo-morphic registration: a demons-based approach, MICCAI 2008, Lecture Notesin Computer Science, 5241, 2008, pp. 754–761.

[137] T. Vercauteren, X. Pennec, A. Perchant, N. Ayache, Non-parametric diffeomorphicimage registration with the demons algorithm, Medical Image Computing andComputer Assisted Intervention (MICCAI'07), 4792/2007, 2007, pp. 319–326.

[138] T. Vercauteren, X. Pennec, A. Perchant, N. Ayache, Diffeomorphic demons: efficientnon-parametric image registration, Neuroimage 45 (Supplement 1) (2009)S61–S72.

[139] G. Walsh, A. Sarti, S. Sastry, Algorithms for steering on the group of rotations,Proceedings of ACC, 1993.

[140] L.Wang, D. Suter, Recognizing human activities from silhouettes: motion subspaceand factorial discriminative graphical model, IEEE International Conference onComputer Vision and Pattern Recognition, 2007.

[141] T. Windheuser, U. Schlickewei, F.R. Schmidt, D. Cremers, Geometrically consistentelastic matching of 3D shapes: A linear programming solution, InternationalConference on Computer Vision (ICCV), 2011.

[142] T. Windheuser, U. Schlickewei, F.R. Schmidt, D. Cremers, Large-scale integer lin-ear programming for orientation-preserving 3D shape matching, ComputerGraphics Forum (Proceedings Symposium Geometry Processing), 2011.

[143] Q. Xie, S. Kurtek, G. Christensen, Z. Ding, E. Klassen, A. Srivastava, A novel frameworkfor metric-based image registration, Workshop on Biomedical Image Registration,2012.

[144] S. Yi, H. Krim, L.K. Norris, Human activity modeling on shape manifold, Euro-graphics 2011 Workshop on 3D Object Retrieval, 2011.

[145] S. Yoshizawa, A.G. Belyaev, H.-P. Seidel, Fast and robust detection of crest lineson meshes, Proc. of ACM Symposium on Solid and Physical Modeling, 2005,pp. 227–232.

[146] L. Younes, Computable elastic distance between shapes, SIAM J. Appl. Math. 58(2) (1998) 565–586.

[147] L. Younes, Optimal matching between shapes via elastic deformations, J. ImageVision Comput. 17 (5/6) (1999) 381–389.

[148] L. Younes, Shapes and Diffeomorphisms, Springer, 2010.[149] L. Younes, P.W. Michor, J. Shah, D. Mumford, R. Lincei, A metric on shape space

with explicit geodesics, Mat. E Applicazioni 19 (1) (2008) 25–57.[150] A.L. Yuille, P.W. Hallinan, D.S. Cohne, Feature extraction from faces using de-

formable templates, Int. J. Comput. Vision 8 (2) (1992) 99–112.[151] P.A. Yushkevich, P.T. Fletcher, S.C. Joshi, A. Thall, S.M. Pizer, Continuous medial

representations for geometric objectmodeling in 2D and 3D, Image Vision Comput.21 (1) (2003) 17–27.

[152] C.T. Zahn, R.Z. Roskies, Fourier descriptors for plane closed curves, IEEE Trans.Comput. 21 (3) (1972).

[153] Y. Zeng, C. Wang, Y. Wang, X. Gu, D. Samaras, N.N. Paragios, Dense non-rigid sur-face registration using high-order graph matching, IEEE Conf. on Computer Visionand Pattern Recognition (CVPR), 2010, pp. 382–389.

[154] D. Zhang, G. Lu, Review of shape representation and description techniques, PatternRecognit. 37 (1) (2004) 1–19.

[155] H. Zhang, A. Sheffer, D. Cohen-Or, Q. Zhou, O. van Kaick, A. Tagliasacchi, Deformation-driven shape correspondence, Proceedings of the Symposium on Geometry Process-ing, SGP'08, 2008, pp. 1431–1439.

[156] H. Zhao, S. Osher, R. Fedkiw, H. Zhao, S. Osher, R. Fedkiw, Fast surface recon-struction using the level set method, IEEE Workshop on Variational and LevelSet Methods, 2001.

416 A. Srivastava et al. / Image and Vision Computing 30 (2012) 398–416