1-s2.0-s003132039900179x-main

*Corresponding author. Tel.: #1-617-621-7524; fax: #1-617-621-7500.

E-mail address: [email protected] (B. Moghaddam).

Pattern Recognition 33 (2000) 1771}1782

Bayesian face recognition

Baback Moghaddam!,*, Tony Jebara", Alex Pentland"

!Mitsubishi Electric Research Laboratory, 201 Broadway, 8th yoor, Cambridge, MA 02139, USA"Massachusettes Institute of Technology, Cambridge, MA 02139, USA

Received 15 January 1999; received in revised form 28 July 1999; accepted 28 July 1999

Abstract

We propose a new technique for direct visual matching of images for the purposes of face recognition and imageretrieval, using a probabilistic measure of similarity, based primarily on a Bayesian (MAP) analysis of image di!erences.The performance advantage of this probabilistic matching technique over standard Euclidean nearest-neighbor eigenfacematching was demonstrated using results from DARPAs 1996 FERETa face recognition competition, in which thisBayesian matching alogrithm was found to be the top performer. In addition, we derive a simple method of replacingcostly computation of nonlinear (on-line) Bayesian similarity measures by inexpensive linear (o!-line) subspace projec-tions and simple Euclidean norms, thus resulting in a signi"cant computational speed-up for implementation with verylarge databases. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.

Keywords: Face Recognition; Density estimation; Bayesian analysis; MAP/ML classi"cation; Principal component analysis; Eigenfaces

1. Introduction

In computer vision, face recognition has a distin-guished lineage going as far back as the 1960s with thework of Bledsoe [1]. This system (and many others likeit) relied on the geometry of (manually extracted) "ducialpoints such as eye/nose/mouth corners and their spatialrelationships (angles, length ratios, etc.). Kanade [2] was"rst to develop a fully automatic version of such asystem. This feature-baseda paradigm persisted (or laiddormant) for nearly 30 years, with researchers often dis-appointed by the low recognition rates achieved even onsmall data sets. It was not until the 1980s that researchersbegan experimenting with visual representations, makinguse of the appearance or texture of facial images, oftenas raw 2D inputs to their systems. This new paradigm inface recognition gained further momentum due, in part,to the rapid advances in connectionist models in the1980s which made possible face recognition systems such

as the layered neural network systems of OToole et al.[3], Flemming and Cottrell [4] as well as the associativememory models used by Kohonen and Lehtio [5]. Thedebate on features vs. templates in face recognition wasmostly settled by a comparative study by Brunelli andPoggio [6], in which template-based techniques provedsigni"cantly superior. In the 1990s, further developmentsin template or appearance-based techniques wereprompted by the ground-breaking work of Kirby andSirovich [7] with Karhunen}Loe ve Transform [8] offaces, which led to the principal component analysis(PCA) [9] eigenfacea technique of Turk and Pentland[10]. For a more comprehensive survey of face recogni-tion techniques the reader is referred to Chellappa et al.[11].

The current state-of-the-art in face recognition is char-acterized (and to some extent dominated) by a family ofsubspace methods originated by Turk and Pentlandseigenfacesa [10], which by now has become a de factostandard and a common performance benchmark in the"eld. Extensions of this technique include view-based andmodular eigenspaces in Pentland et al. [12] and prob-abilistic subspace learning in Moghaddam and Pentland[13,14]. Examples of other subspace techniques includesubspace mixtures by Frey and Huang [15], linear

0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 7 9 - X

discriminant analysis (LDA) as used by Etemad andChellappa [16], the Fisherfacea technique of Belhumeuret al. [17], hierarchical discriminants used by Swets andWeng [18] and evolutionary pursuita of optimal sub-spaces by Liu and Wechsler [19] * all of which haveproved equally (if not more) powerful than standardeigenfacesa.

Eigenspace techniques have also been applied tomodeling the shape (as opposed to texture) of the face.Eigenspace coding of shape-normalized or shape-freeafaces, as suggested by Craw and Cameron [20], is nowa standard pre-processing technique which can enhanceperformance when used in conjunction with shape in-formation [21]. Lanitis et al. [22] have developed anautomatic face-processing system with subspace modelsof both the shape and texture components, which can beused for recognition as well as expression, gender andpose classi"cation. Additionally, subspace analysis hasalso been used for robust face detection [12,14,23], non-linear facial interpolation [24], as well as visual learningfor general object recognition [13,25,26].

2. A Bayesian approach

All of the face recognition systems cited above (indeedthe majority of face recognition systems published in theopen literature) rely on similarity metrics which are inva-riably based on Euclidean distance or normalized cor-relation, thus corresponding to standard template-matchinga * i.e., nearest-neighbor-based recognition.For example, in its simplest form, the similarity measureS(I

1, I

2) between two facial images I

1and I

2can be set

to be inversely proportional to the norm DDI1!I

2DD. Such

a simple metric su!ers from a major drawback: it doesnot exploit knowledge of which types of variation arecritical (as opposed to incidental) in expressing similarity.

In this paper, we present a probabilistic similaritymeasure based on the Bayesian belief that the imageintensity di!erences, denoted by *"I

1!I

2, are charac-

teristic of typical variations in appearance of an indi-vidual. In particular, we de"ne two classes of facial imagevariations: intrapersonal variations )

I(corresponding,

for example, to di!erent facial expressions of the sameindividual) and extrapersonal variations )

E(correspond-

ing to variations between diwerent individuals). Our sim-ilarity measure is then expressed in terms of the probability

S(I1, I

2)"P(*3)

I)"P()

ID*), (1)

where P()ID*) is the a posteriori probability given by

Bayes rule, using estimates of the likelihoods P(*D)I) and

P(*D)E). These likelihoods are derived from training data

using an e$cient subspace method for density estimationof high-dimensional data [14], brie#y reviewed inSection 3.1.

We believe that our Bayesian approach to face recog-nition is possibly the "rst instance of a non-Euclideansimilarity measure used in face recognition [27}30].Furthermore, our method can be viewed as a generalizednonlinear extension of linear discriminant analysis(LDA) [16,18] or FisherFacea techniques [17] forface recognition. Moreover, the mechanics of Baye-sian matching has computational and storage advant-ages over most linear methods for large databases.For example, as shown in Section 3.2, one need onlystore a single image of an individual in the data-base.

3. Probabilistic similarity measures

In previous work [27,31,32], we used Bayesian analy-sis of various types of facial appearance models to char-acterize the observed variations. Three di!erent inter-image representations were analyzed using the binaryformulation ()

I- and )

E-type variation): XYI-warp mo-

dal deformation spectra [27,31,32], XY-warp optical#ow "elds [27,31] and a simpli"ed I-(intensity)-onlyimage-based di!erences [27,29]. In this paper we focuson the latter representation only* the normalized inten-sity di!erence between two facial images which we referto as the * vector.

We de"ne two distinct and mutually exclusive classes:)

Irepresenting intrapersonal variations between

multiple images of the same individual (e.g., with di!erentexpressions and lighting conditions), and )

Erepresenting

extrapersonal variations in matching two di!erent indi-viduals. We will assume that both classes are Gaussian-distributed and seek to obtain estimates of the likelihoodfunctions P(*D)

I) and P(*D)

E) for a given intensity di!er-

ence *"I1!I

2.

Given these likelihoods we can evaluate a similarityscore S(I

1, I

2) between a pair of images directly in terms

of the intrapersonal a posteriori probability as given byBayes rule:

S(I1, I

2)" P(*D)I )P()I )

P(*D)I)P()

I)#P(*D)

E)P()

E), (2)

where the priors P()) can be set to re#ect speci"c operat-ing conditions (e.g., number of test images vs. the size ofthe database) or other sources of a priori knowledgeregarding the two images being matched. Note that thisparticular Bayesian formulation casts the standard facerecognition task (essentially an M-ary classi"cationproblem for M individuals) into a binary pattern classi-"cation problem with )

Iand )

E. This simpler problem

is then solved using the maximum a posteriori (MAP)rule * i.e., two images are determined to belong tothe same individual if P()

ID*)P()

ED*), or equivalently,

if S(I1, I

2)1

2.

1772 B. Moghaddam et al. / Pattern Recognition 33 (2000) 1771}1782

1Tipping and Bishop [33] have since derived the same es-timator for o by showing that its a saddle point of the likelihoodfor a latent variable model.

Fig. 1. (a) Decomposition of RN into the principal subspace F and its orthogonal complement FM for a Gaussian density, (b) a typicaleigenvalue spectrum and its division into the two orthogonal subspaces.

An alternative probabilistic similarity measure can bede"ned in simpler form using the intrapersonal likeli-hood alone,

S@"P(*D)I), (3)

thus leading to maximum likelihood (ML) recognition asopposed to the MAP recognition in Eq. (2). Our experi-mental results in Section 4 indicate that this simpli"edML measure can be almost as e!ective as its MAPcounterpart in most cases.

3.1. Subspace density estimation

One di$culty with this approach is that the intensitydi!erence vector is very high-dimensional, with *3RNwith N typically of O(104). Therefore we almost alwayslack su$cient independent training samples to computereliable second-order statistics for the likelihood densit-ies (i.e., singular covariance matrices will result). Evenif we were able to estimate these statistics, the computa-tional cost of evaluating the likelihoods is formidable.Furthermore, this computation would be highly ine$c-ient since the intrinsic dimensionality or major degrees-of-freedom of * is likely to be signi"cantly smallerthan N.

To deal with the high dimensionality of *, we make useof the e$cient density estimation method proposed byMoghaddam and Pentland [13,14] which divides thevector space RN into two complementary subspaces us-ing an eigenspace decomposition. This method relies on aprincipal components analysis (PCA) [9] to form a low-dimensional estimate of the complete likelihood whichcan be evaluated using only the "rst M principal compo-nents, where M@N.

This decomposition is illustrated in Fig. 1 which showsan orthogonal decomposition of the vector spaceRN intotwo mutually exclusive subspaces: the principal subspaceF containing the "rst M principal components and itsorthogonal complement F, which contains the residual ofthe expansion. The component in the orthogonal sub-space F is the so-called distance-from-feature-spacea(DFFS), a Euclidean distance equivalent to the PCAresidual error. The component of * which lies in thefeature space F is referred to as the distance-in-feature-spacea (DIFS) and is a Mahalanobis distance for Gaus-sian densities.

As shown in Refs. [13,14], the complete likelihoodestimate can be written as the product of two indepen-dent marginal Gaussian densities

PK (*D))"Cexp(!1

2+M

i/1y2i/j

i)

(2p)M@2

We note that in actual practice, the majority ofthe FM eigenvalues are unknown but can be estimated,for example, by "tting a nonlinear function to the avail-able portion of the eigenvalue spectrum and estimatingthe average of the eigenvalues beyond the principalsubspace.

3.2. Ezcient similarity computation

Consider a feature space of * vectors, the di!erencesbetween two images (I

jand I

k). The two classes of inter-

est in this space correspond to intrapersonal and ex-trapersonal variations and each is modeled as a high-dimensional Gaussian density

P(*D)E)" e

~12*T+~1E *

(2p)D@2D+ED1@2

,

P(*DXI)" e

~12*T+~1I *

(2p)D@2D+ID1@2

. (6)

The densities are zero-mean since for each *"Ij!I

kthere exists a *"I

k!I

j. Since these distributions

are known to occupy a principal subspace of image space(face-space), only the principal eigenvectors of theGaussian densities are relevant for modeling. Thesedensities are used to evaluate the similarity score inEq. (2) in accordance with the density estimate inEq. (4).

Computing the similarity score involves "rst subtract-ing a candidate image I

jfrom a database entry I

k. The

resulting * is then projected onto the principal eigenvec-tors of both extrapersonal and intrapersonal Gaussians.The exponentials are then evaluated, normalized andcombined as likelihoods in Eq. (2). This operation isiterated over all members of the database (many I

kimages) until the maximum score is found (i.e., thematch). Thus, for large databases, this evaluation israther expensive.

However, these compuations can be greatly simpli-"ed by o%ine transformations. To compute thelikelihoods P(*D)

I) and P(*D)

E) we pre-process the

Ik

images with whitening transformations and conse-quently every image is stored as two vectors of whitenedsubspace coe$cients; i for intrapersonal and e for ex-trapersonal

ij""~1@2

I

1-s2.0-s003132039900179x-main

Documents