2007 - expression invariant representations of faces

8/13/2019 2007 - Expression Invariant Representations of Faces

1/10

1

Expression-invariant representations of facesAlexander M. Bronstein, Student Member, IEEE, Michael M. Bronstein, Student Member, IEEE,

and Ron Kimmel, Senior Member, IEEE

Abstract Addressed here is the problem of constructing andanalyzing expression-invariant representations of human faces.We demonstrate and justify experimentally a simple geometricmodel that allows to describe facial expressions as isometricdeformations of the facial surface. The main step in the con-struction of expression-invariant representation of a face involvesembedding of the facial intrinsic geometric structure into someconvenient low-dimensional space. We study the inuence of theembedding space geometry and dimensionality choice on therepresentation accuracy and argue that compared to its Euclideancounterpart, spherical embedding leads to notably smaller metricdistortions. We experimentally support our claim showing thata smaller embedding error leads to better recognition.

Keyword: Isometry-invariant representation, multidimensionalscaling, isometric embedding, face recognition, spherical har-monics.

I. INTRODUCTION

Expression-invariant features of the human face play animportant role in various applications. Such are face recog-nition in computer vision [1], [2], texture mapping for facialanimation in computer graphics [3], [4], [5], [6], [7], emotioninterpretation in psychology [8], and measurement of geomet-ric parameters of the face in cosmetic surgery. The variabilityof the face appearance due to its non-rigid structure makesthese tasks non-trivial and challenges for a convenient model

to analyze the nature of facial expressions.In face recognition, the problem of expression-invariantrepresentation is of particular importance as most of todaysface recognition systems are sensitive to facial expressionvariations. The development of robust face recognition algo-rithm insensitive to facial expression is one of the greatestchallenges of current research in this eld [9], [10], [11],[12]. Several approaches have been proposed for this purpose.One possibility is to use only regions of the face the leastsusceptible to variations due to facial expression [13]. Forexample, one can remove the mouth region as it varies greatlywith expression. Yet, practice shows that there is no largesubset of the face that is perfectly rigid across a broad range

of expressions [14]Another possibility is to add different expressions of each

subject into the gallery. The main limitation is the largenumber of instances of each subject due to the richness of all the possible expression. Even though, the probe expressionmay still be substantially different from those sampled.

A third approach is a model of facial expression used togenerate synthetic facial expressions for any 3D surface. Suchmodels are usually very complicated and therefore, not suitable

Manuscript revised March 7, 2006.A. M. Bronstein, M. M. Bronstein and R. Kimmel are with the Department

of Computer Science, Technion Israel Institute of Technology.

for the face recognition application, where real- or near-real time performance is required. Moreover, it is practicallyimpossible to capture the subtle individual differences of expressions in different subjects.

The present paper focuses on a simple geometric modelof facial expressions, referred to as the isometric model andexpression-invariant representation of the face based on thismodel. Our thesis is that facial expressions can be modelledas isometries of the facial surface. A simplied setting of this model assuming all expressions to be with closed mouthwas used in our three-dimensional expression-invariant facerecognition system [2], [15], [16], [17]. Here, we extend thismodel to handle both open and closed mouth.

Using the isometric model, expression-invariant representa-tion of the face is essentially equivalent to isometry-invariantrepresentation of a surface. We apply the isometric embeddingapproach, introduced by Elad and Kimmel [18]. The key ideais to represent the intrinsic metric structure of the facial surfaceby embedding the surface into a low-dimensional Euclideanspace and replacing the geodesic distances by Euclidean ones.The resulting representation, known as canonical form , canthen be treated by conventional methods for rigid surfacematching.

The advantage of this approach over previously proposedmethods is that we do not attempt generating facial expres-sions, but rather create a representation which is (approx-imately) invariant to facial expressions. Secondly, we usethe whole facial surface without removing parts sensitive toexpressions. Thirdly, the gallery contains only a single instanceof a subject, e.g. the neutral expression. Finally, our algorithmis computationally efcient and can be performed in near real-time.

We start with a description of the isometric model andits experimental validation in Section II. In Section III, wepresent the Elad-Kimmel isometry-invariant surface matchingapproach in a broader perspective, for both Euclidean and non-

Euclidean embedding spaces. We address some criteria for thechoice of the embedding space, and present the computationalcore of the approach, based on multidimensional scaling.Section IV deals with facial geometry representation in three-dimensional spaces with Euclidean and spherical geometry.Embedding into a three-dimensional sphere ( S 3 ) introducessmaller distance distortion, and consequently, yields betterrecognition. We provide an experimental evidence of this fact.Section V deals with incorporating the texture informationusing embedding into a two-dimensional space, specicallydiscussing the two-dimensional sphere S 2 . Finally, Section VIconcludes the paper.


2/10

2

II . I SOMETRIC MODEL OF FACIAL EXPRESSIONS

We describe a facial surface as a smooth compact connectedtwo-dimensional Riemannian manifold (surface), denoted byS . The geodesic distances (shortest path lengths) on S are in-duced by the Riemannian metric and are denoted by dS ( 1 , 2 )for 1 , 2 S . A transformation : S S is called anisometry if

dS ( 1 , 2 ) = dS (( 1 ), ( 2 )) , (1)

for all 1 , 2 S . In other words, an isometry preserves theintrinsic metric structure of the surface.

The isometric model, assuming facial expressions to beisometries of some neutral facial expression, is based on theintuitive observation that the facial skin stretches only slightly.All expressions of a face are assumed to be (approximately)intrinsically equivalent (i.e. have the same metric structure).Broadly speaking, the intrinsic geometry of the facial surfacecan be attributed to the subjects identity, while the extrinsicgeometry is attributed to the facial expression.

However, the isometric model also requires the topology

of the surface to be preserved. This assumption is valid formost regions of the face except the mouth. Opening the mouthchanges the topology of the surface by virtually creating ahole. Fig. 1 demonstrates this particular property by showinga minimal geodesic between two points on the upper and thelower lips. In [2], [15], we ignored this possibility, assumingall the expressions to be with closed mouth. Even in such alimited setting, the isometric model appeared to be able togracefully represent strong facial expressions. In order to beable to handle expressions with open and closed mouth, wepropose xing the topology of the surface. We could enforcethe mouth to be always closed by gluing the lips when themouth is open; alternatively, we can constrain the mouth tobe open and disconnecting the lips by means of a cut in thesurface when the mouth is closed. Here, the later option isadopted.

A. Validation of the isometric model

In order to validate the isometric model, we tracked a setof feature points on the facial surface and measured howdistances between them change due to expressions, whilepreserving the topology of the surface. In this experiment, 133white round markers (approximately 2 mm in diameter) wereplaced on the face of a subject as invariant ducial points(Fig. 2, left). The subject was asked to demonstrate different

facial expressions of varying strength with open and closedmouth (total of 16 expressions; see some examples in Fig. 2,right). The faces in this and all the following experimentswere acquired using a coded-light scanner with acquisitiontime of about 150 msec and depth resolution of 0.5 mm [19].As the reference surface, we used a neutral expression.The lips were manually cropped in all the surfaces. FastMarching Method (FMM) [20] was used to measure thegeodesic distances.

The changes of the distances due to facial expressionswere quantied using two measures: the absolute and therelative error with respect to the reference distances. Relative

Without topological constraint

With topological constraint

Fig. 1. Illustration of the open mouth problem. First row: Geodesicson the facial surface with three facial expressions. An open mouthchanges the topology of the surface, which leads to loosing theconsistency of the geodesics. Second row: The same face, in whichthe topology is constrained by cutting out the lips. Consistency of the geodesics is better preserved.

Fig. 2. Isometric model validation experiment. Left: facial imagewith the markers. Right: example of one moderate and two strongfacial expressions with marked reference points.

and absolute error distributions are plotted in Fig. 3. Thestandard deviation of the absolute and the relative errors were5.89 mm and 15.85%, respectively, for geodesic distances, and12.03 mm and 39.62%, respectively, for the Euclidean ones.We conclude that the changes of the geodesic distances dueto facial expressions are insignicant even for extreme ex-pressions, which justies our model. Moreover, the Euclideandistances were demonstrated to be much more sensitive tochanges due to facial expressions compared to the geodesicones. This observation will be reinforced in Section IV-A,where we compare our approach based on geodesic distancesand straightforward rigid matching of facial surfaces.

III. I SOMETRY - INVARIANT REPRESENTATION

Under the assumption of the isometric model, in order toobtain an expression-invariant representation of the face S ,we would like to keep only the intrinsic metric structure of S .Equivalently, we would like to somehow ignore the extrinsicgeometry , that is, the way the surface S is immersed into theambient space. An obvious invariant to isometries is the set of all geodesic distances on the surface S . In practice, S is givenat a nite set of points { 1 ,..., N }, and the geodesic distances


3/10

3

Absolute error (mm)

Relative error (%)

Fig. 3. Normalized histograms of the absolute and the relative errorof the geodesic (solid) and the Euclidean (dotted) distances.

dS ( i , j ) are computed approximately from the samples of S using FMM and represented as an N N matrix .

Yet, comparing explicitly the geodesic distances is dif-cult since the correspondence between the samples of two

surfaces is unknown. As a solution, Elad and Kimmel [18]proposed to represent S as a subset of some convenient m-dimensional space Qm , such that the original intrinsic geom-etry is approximately preserved. Such a procedure is calledan (almost) isometric embedding , and allows to get rid of the extrinsic geometry. As the result of isometric embedding,the representations of all the isometries of S are identical, upto the isometry group in Qm , which is usually easy to dealwith (for example, all possible isometries in R 3 are rotations,translations and reections.)

In the discrete setting, isometric embedding is a mappingbetween two nite metric spaces

: ({ 1 ,..., N

} S , ) ({x 1 ,...,xN

} Q m , D ) ,

such that dS ( i , j ) dQ m (x i , x j ) for all i, j = 1 ,...,N .The matrices = ( ij ) = ( dS ( i , j )) and D =(dij ) = ( dQ m (x i , x j )) denote the pair-wise geodesic dis-tances between the points in the original and the embeddingspace, respectively. Elad and Kimmel [18] called the image{x 1 ,...,x N } = ({ 1 ,..., N }) the canonical form of S . Weemphasize that in general, such an isometric embedding doesnot exist since embedding inevitable introduces metric distor-tion. As the result, the canonical form is just an approximaterepresentation of the discrete surface. Yet, it is possible to ndoptimal canonical forms by minimizing some embedding error

criterion (see Section III-B).Originally, Elad and Kimmel transformed the isometry-

invariant deformable surface matching problem into a problemof rigid surface matching in R 3 . However, for non-trivialsurfaces there is usually an unavoidable representation errorresulting from the theoretical impossibility to embed a curvedsurface into a Euclidean space [21]. The canonical represen-tation of a face is therefore degraded by two types of error:one stemming from the deviation from the isometric modelof facial expressions, and the other caused by embeddingdistortion. While the rst error is a limitation of our model, thesecond one can be reduced by choosing a better embeddingspace. An important question is whether such a choice willalso lead to better recognition.

There are two important aspects to be addressed. First,the geometry of the embedding space is important. Certaingeometries appear to be more suitable for certain types of surfaces, and thus allow smaller embedding errors [22], [6],[23], [24]. For practical reasons, the geodesic distances in theembedding space should be efciently computable. 1 Secondly,

the codimension of the embedding is important. In case of non-zero codimension ( m 3), embedding produces a canonicalform that can be plotted as a surface in the correspondingembedding space. Yet, in this way we take into considerationonly the geometry of the face and do not make explicit useof the photometric information ( texture ), which may containimportant information that can help to discriminate betweenfaces.

A way to incorporate the texture, described as a scalar eld : S R , is to perform a zero-codimension embeddingof S into some two-dimensional space ( m = 2 ). Such anembedding can be thought of as canonical parameterizationof the facial surface. Alternatively, one can think of the

embedding as a warping of the texture image, producing acanonical image a = ( ) [2]. Since the embedding approx-imately preserves the intrinsic geometry of the facial surface,the canonical image is expression-invariant. The differencebetween these two cases is exemplied in Section IV wherewe show embedding into R 3 and S 3 and in Section V, whereembedding into S 2 is shown.

A. Preprocessing

The face canonization procedure consists of three stages:preprocessing, embedding and matching. The aim of the rststage is to crop a region of interest out of the facial surface,

which will be later used for the canonical form computation.First, the surface undergoes standard processing for scanningartifact removal (spike removal, hole lling, smoothing, etc).Then, the facial contour is cropped using the geodesic mask [15]. The main idea is to compute a geodesically-equidistantcontour of a xed radius from some set of anchor points (e.g.nose tip and the sellion point) and cut all the points outside thiscontour. If a topological constraint is employed to allow for

1 A simple analytic expression for geodesic distances is available for spaceswith Euclidean, spherical and hyperbolic geometry. In [25], [26] we show anefcient interpolation procedure for geodesic distances, thereby, the geodesicdistances need not necessarily to be expressed analytically.


4/10

4

Nose mask

Lips mask

Merged mask

Fig. 4. First row, left to right: geodesic mask computation stagesin case of the topologically-constrained isometric model assumption(open and closed mouth): geodesic circles around the nose and thesellion; geodesic circles around the lips; the merged geodesic mask.Second row: examples of the geodesic mask insensitivity to facialexpressions.

expressions with both open and closed mouth, two geodesicmasks are computed: one around the nose tip and the sellionand another around the lips (Fig. 4, rst row). The lips regioncan be found using standard methods, e.g. according to thetexture information. A radius of 30mm is typically used forthe lip mask. The two masks are then merged together. Thisway, we crop the facial surface in a geometrically consistentmanner, which is insensitive to facial expressions (Fig. 4,second row).

B. Multidimensional scaling

The second stage is the embedding. We assume thesmoothed and cropped surface to be sampled at N points{ 1 ,..., N } (typically, N ranges between 1000 and 3000),and assume to be given as input the matrix of geodesicdistances between the surface samples. Every point x in theembedding space Qm is assumed to be corresponding to somem -dimensional parametric coordinates = ( 1 ,..., m ). Thegeodesic distances are given by some explicit or efcientlyapproximated function dQ m (x1 , x 2 ) = dQ m ( 1 , 2 ). Thecanonical form is computed by the following optimizationproblem

= argmin

( ; ), (2)

known as multidimensional scaling (MDS) [27]. Here =( 1 , ..., N ) is an N m matrix of parametric coordinatesin Qm corresponding to the points {x1 ,...,x N }. In case of Euclidean embedding, the parametric coordinates in R m coin-cide with the Cartesian coordinates in R m . Therefore, we canrefer to {x 1 ,...,x N } directly by their Cartesian coordinates,which we denote by X . In the non-Euclidean case, if theembedding space Qm can be realized as a geometric place insome higher-dimensional Euclidean space R m (with m > m ;

for example, S 3 is represented as a unit sphere in R 4 ), we willuse the N m matrix X to denote the coordinates in R mcorresponding to .

The function is an embedding error criterion, measuringthe discrepancies between the original (geodesic) distances andthe embedding space distances. Most common choices of include the raw stress

raw ( ; ) =i>j

(dij ( ) ij )2 , (3)

and the normalized stress

norm ( ; ) = i>j (dij ( ) ij )

2

i>j d2ij ( )

. (4)

Here dij ( ) and ij are a short notation for dQ m ( i , j )and dS ( i , j ), respectively. For additional embedding errorcriteria, see [27].

Note that in fact we compute directly the image({ 1 ,..., N }) rather than itself. Also note that the op-

timization problem is non-convex and using convex opti-

mization algorithms is liable to converge to a local ratherthan global minimum. Nevertheless, convex optimization arecommonly used in the MDS literature [27]. Having a goodinitialization or using multiscale or multigrid optimization[28] reduces the risk of local convergence. Usually, rst-order(gradient-descent) optimization algorithms are employed forlarge scale MDS problems. The computational complexity of such methods grows as O (N 2 ).

C. Canonical form matching

The nal stage is matching of canonical forms. In caseof geometry-only representation (non-zero codimension), weassume to be given two canonical forms {x 1 ,...,x N } =

({ 1 ,..., N }) and {x 1 ,...,x N } = ({ 1 ,..., N }), cor-responding to two surfaces S and S . The canonical formsare represented in the parametric coordinates of Qm by thematrices and . The goal of matching is to compute somedistance d( , ) between the canonical forms.

First of all, the matching should account for the fact that thecanonical forms have some unresolved degrees of freedom andare dened up to an isometry in Qm . The most straightforwardapproach is to use an iterative closest point (ICP) algorithm[29], [30]. These algorithms minimize a distance between two

point clouds over all isometries in Qm , i.e. nd an optimalrigid alignment between and . The ICP problem is well-studied in R 3 , and practically untouched for non-Euclideanspaces. Moreover, ICP algorithms are usually computationallyintensive and thus comparison between a large number of canonical forms, which is typical, for example, in one-to-manyface recognition applications with large databases, becomesprohibitive.

As an alternative, the method of moments can be used [31].In case of Euclidean embedding, the distance between twocanonical forms represented by the Cartesian coordinates Xand X is computed as the difference of their P -th order


5/10

5

moments

dM (X , X ) = p1 + ... + pm P

X p1 ,...,p m X p1 ,...,p m

2

, (5)

where

X p1 ,...,p m =N

i =1

m

k =1x pkik , (6)

is the m -dimensional moment of X for any p1 ,...,p m 0. Asomewhat better approach is to consider the canonical form asa two-dimensional triangulated mesh rather than just a cloudof points, and compute the moments according to

X p1 ,...,p m =T

i =1s i

m

k =1x pkik , (7)

where T denotes the number of triangular faces, and xi ands i stand for the centroid and the area of the i-th triangle,respectively. It is assumed that the canonical forms are rstaligned [18]. In the non-Euclidean case, if Qm is a subset of some R m , we can apply the method of moments to the Carte-sian m -dimensional coordinates X and X . Alternatively, onecan dene the moments directly in Qm , and compute them inthe parameterization domain. The exact way of dening suchmoments depends on the specic choice of the embeddingspace and is usually non-trivial.

In case of geometry and texture representation (zero codi-mension), we assume in addition to be given two textureimages , : S R . The embeddings and map N points of and N points of to Qm ; the other pointsare interpolated. As the result, we have two canonical imagesa, a : Q2 R . The matching in this case is performedbetween the canonical images rather than canonical forms.

One can compare the images a and a directly, e.g. usingcorrelation, or, as an alternative, apply some transform T :a a and compare a and a . For example, in [15], weproposed to project the Euclidean canonical images onto aneigenspace created in a manner similar to eigenfaces [32],[33]. In Section V we show the use of the spherical harmonictransform for comparison of 2D spherical canonical images.An estimate of the albedo (reection coefcient) can be usedto make the canonical image insensitive to illumination [16].

IV. E UCLIDEAN AND SPHERICAL CANONICAL FORMS

The face recognition method introduced in [15] was basedon embedding facial surfaces into R 3 , under a tacit assumptionof closed mouth. In this section, we show Euclidean canonicalforms in R 3 incorporating a topological constraint, whichallows handling both open and closed mouth. Then, we showthat embedding into a three-dimensional sphere ( S 3 ) can beconsidered as a generalization of the Euclidean embeddingand allows more accurate representation of faces.

A. Topologically-constrained Euclidean canonical forms

The main difference between Euclidean canonical formsused in [15] and the topologically-constrained ones proposedhere is the preprocessing stage, in which a more complicated

geodesic mask is used (see Section III-A). In order to testthe topologically-constrained canonical form, we used a dataset containing 102 instances with different facial expressionsof 7 subjects, including both open ( 70 instances) and closed(32 instances) mouth. This is an extended version of the ex-periment presented in [15], which contained only expressionswith closed mouth. In all our experiments, lip contour wassegmented manually. The matching of canonical forms wasbased on 3D moments. For reference, we also show rigidmatching of the original surfaces.

Fig. 6 depicts a low-dimensional representation of thedissimilarities between faces in sense of dM . Each symbolon the plot represents a face; colors denote different subjects;the symbols shape represents the facial expression and its sizerepresents the expression strength. Ideally, clusters correspond-ing to a specic subject should be as tight as possible (meaningthat the representation is insensitive to facial expressions) andas distant as possible from other subjects, which means thatthe representation allows us to discriminate between differentsubjects. It can be seen that for rigid surface matching (Fig. 6,

left) the clusters overlap, implying that variability due tofacial expressions is larger than that stemming from subjectsidentity. On the other hand, using canonical forms, we obtaintight and distinguishable clusters (Fig. 6, right). This is apractical conrmation of our isometric model, implying thatthe intrinsic geometry (captured by the geodesic distances) ismuch better for discriminating the subjects identity than theextrinsic geometry (Euclidean distances).

B. Spherical canonical forms in S 3

In [34], we argued that R 3 is not necessarily the bestcandidate embedding space for face representation. Despitebeing the most convenient to work with, the Euclidean spaceintuitively seems to introduce large distortion, as facial sur-faces are curved while the Euclidean space is at. As analternative, a three-dimensional sphere S 3 with variable radiusR is suggested. The choice of R changes the curvature of the embedding space, thus inuencing the embedding error.Embedding into R 3 can be considered as the asymptotic casewith R . Therefore, embedding into S 3 can potentiallylead to smaller embedding error.

A three-dimensional sphere of radius R can be representedas the geometric location of all vectors of length R in R 4

S 3 = x R 4 : x 2 = R . (8)

For convenience, we discuss a unit sphere; the radius R can betaken into account by scaling the geodesic distances. For everypoint on S 3 , there exists a correspondence between a vectorof parametric coordinates = ( 1 ,..., 3 ) and a unit vectorin R 4 . Given two points 1 , 2 on the sphere (correspondingto unit vectors x 1 , x 2 R 4 ), the geodesic distance betweenthem is the great circle arc length, given by

dS 3 ( 1 , 2 ) = cos 1 ( x 1 , x 2 ). (9)


6/10

6

Fig. 5. Examples of canonical forms obtained by embedding into R 3 with topology constraint. Bold line denotes the lip contour.

Fig. 6. Low-dimensional visualization of dissimilarities between faces obtained using rigid surface matching (left) and embedding into R 3

with topology constraint (right). Colors represent different subjects. Symbols represent different facial expression with open (empty) andclosed mouth (lled).

S 3 can be parameterized as

x 1 ( ) = cos 1 cos 2 cos 3 , (10)x 2 ( ) = cos 1 sin 2 cos 3 ,x 3 ( ) = sin 1 cos 3 ,x 4 ( ) = sin 3 .

where [0, ] [0, 2 ] [0, ]. The geodesic distance isexplicitly expressed as

dS 3 ( 1 , 2 ) =cos 1 cos 11 cos

31 cos

12 cos

32 cos(

21

22 )+

cos 31 cos 32 sin

11 sin

12 + sin

11 sin

32 . (11)

The normalized stress

norm ( ; ) = i


7/10

7

100 200 30050

EER

RANK 1 ERROR

Embedding sphere radius R (mm )

R e c o g n i

t i o n e r r o r

( % )

R M S e m

b e d d i n g e r r o r

( m m

)

Embedding sphere radius R (mm )

10

2

3

4

5

678

70 400 500 600 70080 9060

100 200 30050 70 400 500 600 70080 9060

4

2

6

8

10

12

14

16

18

20EER

RANK 1 ERROR

Fig. 7. First row: embedding error versus the embedding sphereradius for four different subjects (colors denote different subjects,dashed lines indicate 95% condence intervals). Second row: EERand rank-1 error rate versus the embedding sphere radius. Theasymptote R corresponds to embedding into R 3 .

metric distortion introduced by the embedding procedure andrecognition accuracy. Our toy set of 104 faces with controlledrich facial expressions seems to be sufcient for this purpose.

Fig. 7 (rst row) shows the average embedding error asa function of the sphere radius R . The minimum error isobtained around R 75 90 mm slightly depending onthe subject, and then increases asymptotically as R grows toinnity. The asymptote R corresponds to embeddinginto R 3 . We conclude that 3D spherical embedding allows toobtain more than twice lower embedding error compared to3D Euclidean embedding.

Fig. 7 (second row) presents the equal error rate (EER) andrank-1 recognition error as a function of the embedding sphereradius. The minimum EER of 7.09% is achieved for R =70 80 mm ; the lowest rank-1 error of about 5.48% is obtainedfor R = 75 mm . Both measures grow more than twice asthe embedding sphere radius is decreased ( R = 50 mm ) orincreased ( R ). We conclude that both EER and rank-1 error achieve the minimum at embedding sphere radii thatyield minimum embedding error. This experimental evidencesupports our conjecture that smaller embedding error results inbetter recognition, and motivates the search for low-distortionrepresentation of surfaces.

V. S PHERICAL CANONICAL IMAGES IN S 2

In case of zero-codimension embedding (i.e. embedding thefacial surface into a two-dimensional space), the embeddingproduces a canonical parameterization of the surface, whichallows comparing texture in an expression-invariant way. The

Fig. 8. Face embedded into S 2 with different radii, from left to right:R = 80 mm, 100 mm and 150 mm .

simplest case is embedding into R 2 , producing a warpedtexture, which we called the canonical image [2].

In this section we propose to embed the facial image intoa two-dimensional sphere ( S 2 ) rather than a plane, whichwill be shown to produce lower embedding distortions. Sec-ondly, we present another experimental evidence supportingour conjecture that lower embedding error directly implieshigher recognition accuracy. Finally, we take advantage of the fact that the new canonical image is dened on S 2 anduse spherical harmonics to measure dissimilarity between

two images. Rotation and reection invariance of sphericalharmonics removes the embedding ambiguity and does notrequire alignment of the canonical images.

We parameterize S 2 using two angles: the elevation 1

2 , +2 measured from the azimuthal plane in the northern

direction, and the azimuthal angle 2 [0, 2 ). The geodesicdistance between two points 1 = ( 11 , 21 ) and 2 = ( 12 , 22 )is given by

dS 2 ( 1 , 2 ) = (16)R cos 1 cos 11 cos

12 cos(

21 22 ) + sin 11 sin 12 ,

where R is the sphere radius. As before, we consider a unit

sphere; different values of R are achieved by scaling the inputdistance matrix .As the embedding error criterion, we use the raw stress

(3) where dij ( ) = dS 2 ( i , j ). One of the points, say 1 ,is anchored to the north pole of the sphere ( 11 = 2 ). Thispoint is chosen to be the nose tip and is determined as a localmaximum of the Gaussian curvature on the facial surface.

The mapping : S S 2 from the original surface to thesphere can be regarded as a warping transformation, whichmaps the original facial image : S R onto a sectionof the sphere. The resulting image, a : (S ) S 2 R ,can be computed for any by means of linear interpolation(see Figures 8, 9). The image a referred to as the spherical

canonical image [2] is invariant to isometric deformations of the face and, hence, insensitive to facial expressions. Yet, thespherical canonical image is not a fully invariant signature of the face, since xing a single ducial point on the pole stillallows one degree of freedom of rotation and reection aboutthat point.

A. Spherical harmonic signatures

In order to obtain a truly invariant signature of the face,we use the spherical harmonic transform (SHT). A functiona L 2 S 2 can be expanded in the spherical harmonic basis


8/10

8

with coefcients

a l,m = a, Y ml (17)

=

0 2

0a 1 , 2 Y ml (

1 , 2 )d 2 cos(1 )d 1 ,

for l N { 0} and |m | l, where

Y ml 1 , 2 = (2l + 1)( l m )!

4 (l + m )! P ml sin 1 em2

(18)

is the (l, m )-spherical harmonic, and P ml is the associateLegendre function of degree l and order m . A discrete versionof the SHT a l,m can be carried out efciently using the FFT[36].

A handy property of spherical harmonics is that for every 2 , a 1 , 2 and a 1 , 2 2 are transformed to twosets of coefcients, which differ only in the complex phase.Hence, the set of coefcients cl,m = |a l,m | removes therotation and reection ambiguities from the canonical imageand denes an invariant signature of the face.

As a dissimilarity measure between two such signatures, weuse the Euclidean norm

dSH cl,m , cl,m =l 0 , | m | 0

cl,m cl,m2

. (19)

Using basic properties of spherical harmonics, it is straight-forward to show that such a measure is characterized by thesimilarity property, i.e.

dSH cl,m , cl,m l 0 , | m | 0

a l,m a l,m2

= a a , (20)

and since dSH cl,m , cl,m is invariant under any R G ,where G is the group of rotations and reections about thenorth pole on S 2 ,

dSH cl,m , cl,m minR G a R a . (21)

In other words, similar a and a (up some R G ) will resultin small dissimilarities in sense of dSH , whereas dissimilar aand a will result in large values of dSH . In practice, differentfrequencies in the spherical harmonics domain typicallycarry different amounts of information useful for discriminat-ing between different subjects. Therefore, a more sophisticateddissimilarity measure could be based on a weighted Euclideannorm. Optimal weights can be found from a training set e.g.by using PCA.

A disadvantage of the proposed representation is that it isinvariant only to azimuthal roto-reection, while being sensi-tive to general roto-reections on S 2 . This, in turn, requiresthe use of constrained embedding, which xes the locationof the nose tip, and thus relies on its location. The use of anon-constrained embedding is feasible in combination with asignature invariant under a general roto-reection group on S 2 .Construction of such a signature is based on the fact that thesubspace

V l = span {Y ml : |m | l} (22)

is closed under a roto-reection group on S 2 [37]. Hence, a

Fig. 9. First row: original facial images; second row: spherical canon-ical images shown in parametric coordinates; third row: sphericalharmonics transform coefcients

signature of the form

cl = a l = (a l, l , ..., a l,l ) (23)

is invariant under general rotations and reections. Dissimi-larity between such signatures can be measured as in (19).

B. Embedding and recognition error

The spherical canonical images method was applied to thedata set from Section IV-C. Fig. 10 (rst row) depicts theaverage embedding error for each subject plotted as a functionof R . It appears that embedding sphere radius yielding theminimum embedding error ranges from 90 to 100mm , slightlydepending on the subject.

Fig. 10 (second row) presents the equal error rate (EER)and rank-1 recognition error as a function of the embedding

sphere radius. The minimum EER of 12.39% is achieved atR = 90 mm . At this embedding sphere radius, rank-1 errorof 1.96% is achieved, which remains nearly constant for R =90 125 mm . Both recognition error measures appear to be inclose correspondence with the embedding error. This providesan additional experimental evidence to support our conjecturethat smaller embedding error results in better recognition.

VI. D ISCUSSION AND CONCLUSION

We started with the assumption that facial expressions canbe considered as isometries of the facial surface and showedan empirical justication for this model. We demonstratedhow this model can be applied to constructing expression-

invariant representations of faces using the isometric em-bedding approach. The resulting representation is useful, forexample, in 3D face recognition. We studied the impact of the embedding space choice on the metric distortion intro-duced by the embedding and concluded that spaces withspherical geometry are more favorable for representation of facial surfaces. We also provided an experimental evidencethat spherical embedding leads to better recognition ratescompared to Euclidean ones. Furthermore, the use of sphericalcanonical images in S 2 allows us to perform matching in thespherical harmonic transform domain, which does not requirepreliminary alignment of the images. The results of M emoli


9/10


10/10

10

[29] P. J. Besl and N. D. McKay, A method for registration of 3D shapes, IEEE Trans. PAMI , vol. 14, pp. 239256, 1992.

[30] Z. Y. Zhang, Iterative point matching for registration of free-formcurves and surfaces, IJCV , vol. 13, pp. 119152, 1994.

[31] A. Tal, M. Elad, and S. Ar, Content based retrieval of VRML objects -an iterative and interactive approach, in Proc. Eurographics Workshopon Multimedia , 2001.

[32] M. Turk and A. Pentland, Face recognition using eigenfaces, in Proc.Computer Vision and Pattern Recognition , 1991, pp. 586591.

[33] L. Sirovich and M. Kirby, Low-dimensional procedure for the charac-

terization of human faces, JOSA A, vol. 2, pp. 519524, 1987.[34] A. M. Bronstein, M. M. Bronstein, and R. Kimmel, On isometric

embedding of facial surfaces into S 3 , in Intl. Conf. on Scale Spaceand PDE Methods in Computer Vision , 2005, pp. 622631.

[35] A. M. Bronstein, M. M. Bronstein, and R. Kimmel, Robust expression-invariant face recognition from partially missing data, in Proc. ECCV .2006, Lecture Notes on Computer Science, pp. 396408, Springer.

[36] P. Kostelec D. Healy Jr., D. Rockmore and S. Moore, FFTs for the 2-sphere improvements and variations, The Journal of Fourier Analysisand Applications , vol. 9, no. 4, pp. 341385, 2003.

[37] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz, Rotation invariantspherical harmonic representation of 3D shape descriptors, in Proc. Eurographics Symp. on Geom. Proc. , 2003.

[38] F. M emoli and G. Sapiro, A theoretical and computational framework for isometry invariant recognition of point cloud data, Foundations of Computational Mathematics , vol. 5, no. 3, pp. 313347, 2005.

[39] M. M. Bronstein, A. M. Bronstein, R. Kimmel, and I. Yavneh, A

multigrid approach for multidimensional scaling, in Proc. Copper Mountain Conf. Multigrid Methods , 2005.

2007 - expression invariant representations of faces

Documents