face appearance different lighting conditionsmtappen/pubs/ieee_btas_2008.pdflearning face appearance...

8
Learning Face Appearance under Different Lighting Conditions Brendan Moore, Marshall Tappen, Hassan Foroosh Computational Imaging Laboratory, University of Central Florida, Orlando, FL 32816 Abstract- We propose a machine learning approach for estimating intrinsic faces and hence de-illuminating and re- illuminating faces directly in the image domain. The most chal- lenging step is de-illumination, where unlike existing methods that require either the 3D geometry or expensive setups, we show that the problem can be solved with relatively simple kernel regression models. For this purpose, the problem of decomposing an observed image into its intrinsic components, i.e. reflectance and albedo, is formulated as a nonlinear regres- sion problem. The estimation of an intrinsic component is then accomplished by estimating local linear constraints on images in terms of derivatives using multi-scale patches of the observed images, comprising from a three-level Laplacian Pyramid. We have evaluated our method on "Extended Yale Face Database B" and shown that despite its simplicity, the method is able to produce realistic results using images taken from only four different lighting orientations. I. INTRODUCTION The illumination of objects immensely affects their ap- pearance in the pictures. In particular, variability in illumi- nation leads to large variability in appearance. Modeling this variability accurately is a fundamental problem that occurs in many areas of computer vision and computer graphics. Applications include, retouching pictures, video editing, post cinematographic effects, improved recognition of objects (e.g. faces and people), etc. In graphics applications, of course, this same variability must be handled from both an analysis and synthesis point of view. Systems must be able to both remove existing illumination effects and introduce new illuminations. Challenges from varying illumination also arise when attempting to accurately model the three dimensional relationships in an image [12] or estimate image gradient information [5], [6]. In this paper, we present a relatively simple, yet very efficient method of modeling illumination variations. Our approach makes two main contributions to this domain of study: Its first characteristic feature is that one does not require to make explicit assumptions about the illuminated object model, e.g. Lambertian, or specular. The key idea is that for a given class of objects (e.g. faces) the input- output relationship is implicitly captured by data, and hence a training set can be used to learn how to both de- illuminate and re-illuminate both diffuse and specular regions. This work was in part supported by a grant from Electronic Arts - Tiburon. Brendan Moore Marshall Tappen, and Hassan Foroosh are with the School of EECS at the University of Central Florida, {tmoore,mtappen, foroosh}@cs.ucf.edu L Second, it is relatively simple, both conceptually and in terms of implementing the method. In particular, it does not require 3D data or a large set of input images of objects illuminated from all possible directions - e.g. as in light field rendering, or light stage (see below). Hence, given the realistic results produced by the method, the approach may be viewed as a tool for affordable computational photography on a desktop for ordinary users. II. RELATED WORK In face relighting, existing techniques fall into two general categories: 2D image-based techniques; and 3D geometry- based techniques. In [13] an image-based re-rendering technique using the idea of a "Quotient Image" is presented. Given two objects, the quotient image is defined as the ratio of their albedo functions. This representation depends only on relative sur- face texture information and is therefore independent of illumination. Linear combinations of this low dimensional representation can then be used to generate new illumina- tions. A limitation of this approach is that the reflectance property of the objects under consideration are all assumed to adhere to the Lambertian reflectance model. In general, the reflectance of a point on an object can be described by a 4D function called the bidirectional reflectance distribution function (BRDF). The Lambertian model condenses the 4D BRDF into a constant that is used to scale the inner product between the surface normal and the light vector. It is shown that images generated by varying the lighting on a collection of Lambertian objects that all have the same shape but differ in their surface albedo can be analytically described using at least three "bootstrap" images of a prototype object and an illumination invariant "signature" image, i.e. the Quotient Image. The prototype objects consist of images of an object from the same class taken under three linearly independent light sources. This data is used to define a subspace, or basis. A new illumination is then generated by taking the pixel- wise Cartesian product of a weighted sum of the basis and the Quotient Image. Of course direct knowledge of the parameters of the albedo function that make up the ratio that defines the quotient image are not known. Therefore, the quotient image is estimated by finding the correct set of coefficients, via the minimization of a defined energy function and least squares. In addition to Lambertian assumption, one limitation with the quotient image approach is that it can only be applied to faces that have the same view or pose as the face used in the 978-1-4244-2730-7/08/$25.00 02008 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on April 5, 2009 at 22:18 from IEEE Xplore. Restrictions apply.

Upload: others

Post on 22-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Face Appearance Different Lighting ConditionsMtappen/Pubs/Ieee_btas_2008.pdfLearning Face Appearance under Different Lighting Conditions Brendan Moore, Marshall Tappen, Hassan Foroosh

Learning Face Appearance under Different Lighting Conditions

Brendan Moore, Marshall Tappen, Hassan ForooshComputational Imaging Laboratory, University of Central Florida, Orlando, FL 32816

Abstract-We propose a machine learning approach forestimating intrinsic faces and hence de-illuminating and re-illuminating faces directly in the image domain. The most chal-lenging step is de-illumination, where unlike existing methodsthat require either the 3D geometry or expensive setups, weshow that the problem can be solved with relatively simplekernel regression models. For this purpose, the problem ofdecomposing an observed image into its intrinsic components,i.e. reflectance and albedo, is formulated as a nonlinear regres-sion problem. The estimation of an intrinsic component is thenaccomplished by estimating local linear constraints on imagesin terms of derivatives using multi-scale patches of the observedimages, comprising from a three-level Laplacian Pyramid. Wehave evaluated our method on "Extended Yale Face DatabaseB" and shown that despite its simplicity, the method is ableto produce realistic results using images taken from only fourdifferent lighting orientations.

I. INTRODUCTION

The illumination of objects immensely affects their ap-pearance in the pictures. In particular, variability in illumi-nation leads to large variability in appearance. Modeling thisvariability accurately is a fundamental problem that occursin many areas of computer vision and computer graphics.Applications include, retouching pictures, video editing, postcinematographic effects, improved recognition of objects(e.g. faces and people), etc. In graphics applications, ofcourse, this same variability must be handled from both ananalysis and synthesis point of view. Systems must be ableto both remove existing illumination effects and introducenew illuminations. Challenges from varying illuminationalso arise when attempting to accurately model the threedimensional relationships in an image [12] or estimate imagegradient information [5], [6].

In this paper, we present a relatively simple, yet veryefficient method of modeling illumination variations. Ourapproach makes two main contributions to this domain ofstudy:

Its first characteristic feature is that one does not requireto make explicit assumptions about the illuminatedobject model, e.g. Lambertian, or specular. The key ideais that for a given class of objects (e.g. faces) the input-output relationship is implicitly captured by data, andhence a training set can be used to learn how to both de-illuminate and re-illuminate both diffuse and specularregions.

This work was in part supported by a grant from Electronic Arts -Tiburon.

Brendan Moore Marshall Tappen, and Hassan Foroosh arewith the School of EECS at the University of Central Florida,{tmoore,mtappen, foroosh}@cs.ucf.edu

L Second, it is relatively simple, both conceptually andin terms of implementing the method. In particular, itdoes not require 3D data or a large set of input imagesof objects illuminated from all possible directions -e.g. as in light field rendering, or light stage (seebelow). Hence, given the realistic results produced bythe method, the approach may be viewed as a tool foraffordable computational photography on a desktop forordinary users.

II. RELATED WORK

In face relighting, existing techniques fall into two generalcategories: 2D image-based techniques; and 3D geometry-based techniques.

In [13] an image-based re-rendering technique using theidea of a "Quotient Image" is presented. Given two objects,the quotient image is defined as the ratio of their albedofunctions. This representation depends only on relative sur-face texture information and is therefore independent ofillumination. Linear combinations of this low dimensionalrepresentation can then be used to generate new illumina-tions. A limitation of this approach is that the reflectanceproperty of the objects under consideration are all assumedto adhere to the Lambertian reflectance model. In general,the reflectance of a point on an object can be described by a4D function called the bidirectional reflectance distributionfunction (BRDF). The Lambertian model condenses the 4DBRDF into a constant that is used to scale the inner productbetween the surface normal and the light vector. It is shownthat images generated by varying the lighting on a collectionof Lambertian objects that all have the same shape but differin their surface albedo can be analytically described usingat least three "bootstrap" images of a prototype object andan illumination invariant "signature" image, i.e. the QuotientImage. The prototype objects consist of images of an objectfrom the same class taken under three linearly independentlight sources. This data is used to define a subspace, or basis.A new illumination is then generated by taking the pixel-wise Cartesian product of a weighted sum of the basis andthe Quotient Image.Of course direct knowledge of the parameters of the albedo

function that make up the ratio that defines the quotientimage are not known. Therefore, the quotient image isestimated by finding the correct set of coefficients, via theminimization of a defined energy function and least squares.In addition to Lambertian assumption, one limitation withthe quotient image approach is that it can only be applied tofaces that have the same view or pose as the face used in the

978-1-4244-2730-7/08/$25.00 02008 IEEE

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 5, 2009 at 22:18 from IEEE Xplore. Restrictions apply.

Page 2: Face Appearance Different Lighting ConditionsMtappen/Pubs/Ieee_btas_2008.pdfLearning Face Appearance under Different Lighting Conditions Brendan Moore, Marshall Tappen, Hassan Foroosh

creation of the quotient image. This limitation is addressedin [14]. In order to accommodate for arbitrary views of theface an image morphing step is introduced into the synthesisprocess.

In [4], a geometric approach using three-dimensionallaser scans of the human heads is presented. The three-dimensional scans are used to create a morphable face thatis then fit to a model. The parameters of the morphableface model can then be used for re-rendering. The majordeficit of this approach is the need to build a dense point-to-point correspondence between the 3D model and the trainingfaces. This is computationally expensive and requires manualinterventions.

In [12] a geometric approach utilizing a frequency-domainview of reflection and illumination is considered via the useof spherical harmonics. Since spherical harmonics form acomplete orthonormal basis for functions on the unit spherethe goal is to parameterize the BRDF as a function onthe unit sphere. By assuming that the illumination fieldis independent of surface position they reparameterize theproblem in terms of the surface normal. The authors statethat lighting is expressed in global coordinates since it isconstant over the object surface when viewed with respectto a global reference frame. Therefore it is necessary to relatethe parameters to global coordinates. This is accomplishedthrough a set of rotations that operate on the local angles. Asimilar process is used to expand the local representation andthe cosine transfer function. The illumination integral is thenshown to be a simple product in terms of spherical harmoniccoefficients. Therefore the estimation of the illumination ofa convex Lambertian object can be done by solving for thelighting coefficients in the product. The author reported thatthe reflected light field from a convex Lambertian object canbe well approximated using spherical harmonics up to order2 (a 9 term representation), i.e. as a quadratic polynomial ofthe Cartesian coordinates of the surface normal vector.

There are several deficits in this approach. First, SphericalHarmonics do not have compact support. This means thatthe majority of the energy contained in the function isnot concentrated in one area. Therefore, the method wouldfail at capturing specular reflections. Also, truncating thebasis would cause ringing in the higher frequency specularcomponents. Next, since we must convert the basis repre-sentation from local to global coordinates (or vice versa)a rotation matrix that represents the necessary rotations ofeach of the basis vectors must be constructed. This matrixcould get prohibitively large given a large number of basisvectors (such as the basis needed to represent high energysignals). Finally, for the inverse illumination problem wesolve for the lighting coefficients by dividing the inverseof the normalization constant by the transfer function. Thisdivision could cause computational problems, for instancewhen the incident light source is roughly aligned with thesurface normal.

In [7] an image based technique for capturing the re-flectance field of the human face is presented. The cor-

light stage. This device illuminates a subject by rotatingthe illumination source along two axis, the azimuth and theinclination. While the illumination source is rotated, twocalibrated video cameras are used to capture videos at 30frames per second. This yields 64 subdivisions along theazimuth and 32 divisions along the inclination for a totalof 2048 direction samples. From this dense set of samples a

reflectance function for each pixel is defined. Essentially, themain idea is that sampling the face under a dense set of illu-minations effectively encodes the effects of diffuse reflection,specular reflection, self-shadowing, translucency, mutual illu-mination, and subsurface scattering. The technique is furtherextended to novel viewpoints by a re-synthesis technique thatdecomposes the reflectance function into its specular anddiffuse components. This process assumes that the specularand diffuse colors are known. The specular color comes

directly from the color of the incident light. The diffuse colorcomes from the estimation of a diffuse chromaticity ramp.

The obvious shortcoming of this approach is its inflexibil-ity for many practical applications and the associated cost ofbuilding such apparatus affordable to only the targeted movieindustry. On the other hand the quality of the synthesizedillumination is directly proportional to the number of imagesgenerated in the capture process. Even at course increments,this data set gets rather large. This would make extendingthis approach to any real-time applications difficult. Second,since the resolution of the reflectance function is equal tothe number of sampled directions aliasing could occur inplaces where there are large changes in pixel values fromone illumination to another. For example, the shadow ofthe nose onto the face, i.e. self-shadowing. Finally, sincethis technique defines the reflectance solely in terms ofdirectional illumination dappled lighting or partial shadowscould be problematic.

In [9] an image based approach using the idea of EigenLight-Fields is presented. The Eigen Light-Fields techniqueemploys Principal Component Analysis to generate a basis,which is then used in a least squares setting. Given a

collection of light-fields (plenoptic functions) of objects suchas faces, an eigen-decomposition is performed via PCA. Thisgenerates an eigen-space of light fields. The approximationof a new light field is then used for relighting. One of themain limitations of the light field approach is that a hugenumber of images of the object are needed to capture thecomplete light field. In most computer vision applicationsit it unreasonable to expect more than a few images of theobject.

In [11], a hybrid image-based/geometric approach forestimating the principal lighting direction that exists in a setof frontal face images is presented. The technique employsa least squares formulation that minimizes the differencebetween image pixel data and a so-called shape-albedomatrix. The shape-albedo matrix consists of the Hadamard(element-wise) matrix product of a vectorized albedo map

and a matrix of surface normals. To obtain the Normal vectorinformation used in the shape-albedo matrix, a generic shape

nerstone in this approach is the use of a device called a model was created using the average 3D shape of 138 head

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 5, 2009 at 22:18 from IEEE Xplore. Restrictions apply.

Page 3: Face Appearance Different Lighting ConditionsMtappen/Pubs/Ieee_btas_2008.pdfLearning Face Appearance under Different Lighting Conditions Brendan Moore, Marshall Tappen, Hassan Foroosh

models captured via a 3D scanner. The albedo data wasgenerated by averaging the facial texture information fromthe Yale data set. Given the estimated lighting direction of aninput image, a new illumination is applied by "undoing" theexisting illumination and combining this specific albedo datawith the generic 3D face model data. The main limitationof this image based approach is of course the assumptionthat one can always obtain an accurate 3D model. Also, thealgorithm is limited to re-lighitng faces under a fixed pose,in this case a frontal view.The approach proposed herein is an image-based method.

It should be noted that, illumination for face recognitionis not a topic that we are concerned with in this paper.Instead, our main focus is twofold: (i) generating convincingand plausible illuminations; (ii) use little input data, and nospecial equipment such as the light stage.

III. OUR APPROACH

Our approach builds upon the work by Tappen et al.[16], [15], who developed methods for estimating intrinsicimages. The basic idea is that it is possible to estimatethe derivatives of the image of an object from a givenclass as it would appear with a uniform illumination, whichwe refer to as the de-illuminated image. The uniformlyilluminated image can then be reconstructed from thesederivatives. In this context, our method can be viewed aslearning to estimate the derivatives of intrinsic images forspecific types of objects. For this purpose, the problem ofdecomposing an observed image into its intrinsic components(i.e. reflectance and albedo) is formulated as a nonlinearregression problem. The estimation of an intrinsic componentis done by first estimating a set of local linear constraintsin terms of derivatives, from multi-scale patches of theobserved image. In our case, a multi-scale patch is comprisedof 3x3 pixel data from a three level Laplacian Pyramid.The multi-scale representation effectively allows for largerderivative data to be considered with only a small increase indimensionality. By operating over multi-scale patches ratherthan the raw image the system effectively overcomes thecurse of dimensionality [2]. For example, given a relativelysmall image of 320 by 240 pixels we would end up with a76800 dimensional regression problem. This would producea problem that is too large for standard regression techniquesto handle. Operating on patches also allow us to toleratemisalignments.Once the image derivatives are estimated the final image

must be computed. Each of the estimated derivatives can bethought of as a constraint that must me met in the estimationof the new component image. This image is found bysolving for the image that best satisfies these constraints. Theproblem is thus reduced to estimating a weight matrix for abasis function. These weights are found by minimizing thesquared error between ground truth images and the estimatedimage.

IV. OVERVIEWWe divide the relighting process into two main phases.

The first phase focuses on recovering the face as it wouldappear under uniform illumination. We will refer to this stepas "de-illuminating" the face. This is the same type of imageas an intrinsic image from [15], [16]. In the second phase,new illuminations are synthesized from the de-illuminatedface image. As we will show in Section VII, these canbe combined with the de-illuminated face to produce newimages of the face under various illuminations. Throughoutthe rest of this paper, we will refer to this second phase asthe re-illumination phase.We first describe in Section V our approach for de-

illuminating faces. Following that, Section VI describes howthese faces can be re-illuminated.

V. DE-ILLUMINATING FACESWe treat the de-illumination step as an estimation problem.

Given an image of a face under some known illumination, weuse non-linear regression to estimate the derivatives of thede-illuminated face image. In other words, we estimate whatthe result of filtering the de-illuminated face with derivativefilters would look like. Once the derivative estimates havebeen computed, the de-illuminated face is reconstructed fromthem using systems of equations designed for solving thePoisson equation [1].

A. Basic Formulation for Derivative EstimatorsThe non-linear regression system works by modeling the

derivative estimates as a linear combination of non-linearbasis functions. The goal of regression is to predict the valueof one or more continuous target variables t, given the valueof a d-dimensional vector x of input variables [3]. This canbe accomplished by finding the linear combination of basisfunctions that will give us the correct derivatives, t, given aninput point x from the source feature space.

Formally, this can be expressed by defining a functiony (x, w) such that,

M-1

y(x; W) = E Wjj(X) = WT (X) (1)

where w are weight parameters that control the overallcontribution of each of the b basis functions. There are manypossible choices of basis functions. For our application, wechoose X to be the Gaussian radial basis function (RBF) dueto its well-studied analytic behavior across multiple scales:

j5(x) = exp {(X ll)2} (2)

where ,u is the mean, which controls the location of each ofthe basis functions and s2 is the variance, which controls thespatial scale. It should be noted that the classical Gaussiankernel typically includes a normalization coefficient. For ourpurposes, this coefficient is unnecessary because each basisfunction is multiplied by a corresponding weight parameter,Wi .

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 5, 2009 at 22:18 from IEEE Xplore. Restrictions apply.

Page 4: Face Appearance Different Lighting ConditionsMtappen/Pubs/Ieee_btas_2008.pdfLearning Face Appearance under Different Lighting Conditions Brendan Moore, Marshall Tappen, Hassan Foroosh

Fig. 1. Example de-illumination training data for the Small Faces.Each column represents a source training set for a particular illuminationmodel. In this case: illumination from the right; illumination from the top;illumination from the left; illumination from the bottom. The far rightcolumn is the uniformly illuminated target training data from which thederivatives are generated.

illumination sets based on the location of the principallight source, i.e. left, top, right, bottom. We learn one de-illumination model for each illumination location.As mentioned above, we learn two estimators for each

face patch in the image. The inputs, x, to the estimatorsare 3x3 image patches contained in one of the larger facepatches. Each of these 3x3 image patches is pre-processedby subtracting the mean value of all image patches in thatparticular face patch. We will denote the set of all 3x3patches associated with a face patch as X.

Since each basis function in Equation 1 is a functionof ,u and s, we need to find appropriate values for theseparameters. The problem of finding suitable values for ,u isformulated as a k-means optimization problem. Given the setof n data points in X and an integer k, we would like to findthe set of k points, called centers, that minimize the meansquared distance from each data point to its nearest center[10].Once suitable cluster means are found, the scaling pa-

rameter s is calculated. For our problem the parameter sis actually a matrix, i.e. a covariance matrix S. For eachcluster, which is represented by a mean and its associateddata points, the within-cluster covariance is calculated. Thismakes the basis functions multivariate Gaussians of the form

(3)B. Taking Advantage of Regularities in Facial Appearance

Because we have focused on faces, we can take advantageof the statistically regular structure in faces. The face imagesare only roughly aligned so that key facial features, suchas the eyes, are approximately at known locations. Imagesare then divided into patches. In our current implementation,they are divided into twenty rows and twenty columns, fora total of 400 patches. We will refer to these large patchesas face patches

For each face patch, two estimators are trained. Oneestimator predicts the horizontal derivatives of each patchand the second estimator predicts the vertical derivatives. Webreak up the face image into patches so that each estimatorcan specialize on particular part of the face. While we donot use an explicit 3D model of the faces, this approachimplicitly assumes that faces have a regular structure.Upon estimating derivatives and then re-integrating them

patches are forced to blend together seamlessly. We operateon patches, rather than pixels to make our system morerobust to noise and misalignment errors, while implicitlyincorporating structure.

C. Learning to Estimate De-Illuminated Derivatives

We find the weights, wj in Equation 1 by training theregressors on images of faces taken under multiple illumi-nations, such as images from the Yale Face Database [8].When an image of the face under uniform illumination isnot available to serve as the target image, we have foundthat a suitable substitute can be created by averaging differentilluminations. The training data set is divided into different

In this form special care must be taken when calculating-1. In general, E may not be invertible. This is typically

due to the creation of an under-determined system whichis caused by too few points being assigned to a particularcluster. For these cases a regularized pseudo-inverse is used.

With our inputs defined as X, our targets defined as T,and our basis functions defined as q(x) as in Equation 3, wecan now find the value for w in Equation 1. The solutionfor w can be defined as the value that minimizes the sum-of-squares error function defined as

IN 2

E(w) = E {tn - WTq(Xn)}n2=

(4)

Where t, is a instance of a target from T and xJ isan instance of a sample from X. Taking the derivative ofEquation 4 with respect to w yields

N

d E(w) = E:t{nWT¢O(Xn)I¢(Xn)TdwE() Sn=1

Setting Equation 5 equal to zero and rearranging

N (NAI: tni( X)T-WIT c E(X )O(Xn)T = °

n=g n=f

Solving for w

(5)

(6)

oj (x) = (x pj), Y, -, (x pj)

-

= (..DT..D) 1..DTtw (7)

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 5, 2009 at 22:18 from IEEE Xplore. Restrictions apply.

Page 5: Face Appearance Different Lighting ConditionsMtappen/Pubs/Ieee_btas_2008.pdfLearning Face Appearance under Different Lighting Conditions Brendan Moore, Marshall Tappen, Hassan Foroosh

Where wT is the solution that minimizes the differencebetween the targets (the observed derivative values) and theweighted sum of the basis functions evaluated at a particularx value. Jb is an N x M matrix that contains all of the basisfunctions evaluated at every sample point and has the form

o(xi)

10 (X2)

(D =I

\ 00 (XN)

1i(xi)51 (X2)

q1 (XN)

... M-1(x1)

** * M-1 (X2)

* **M-1(XN) )

(8)

After the construction of the training source feature space,

the training target features are calculated. As stated above, wemodel the changes in illumination in terms of the changes inthe image derivatives. Therefore, we calculate the horizontaland vertical derivatives for each pixel, in each subsection,over each uniformly illuminated image in the training set.The resulting derivatives are then placed in a matrix thatrepresents the training target feature space, which we defineas T.

Using non-linear regression, we estimate the diffuse il-lumination model described by the relationships betweenthe training source features and training target features.The process of non-linear regression is detailed bellow. Theoutput is a vector of weighting values that are used to controlthe individual contributions of a set of non-linear functions.

To generate a new uniformly-illuminated image from an

input picture (i.e. one that is not a part of the training set,but contains one of the learned illumination models: left,right, top, bottom), we segment the new image, generate theimage patches, and vectorize the patches. We can think ofeach vectorized patch as a point in d-dimensional space. Foreach point, the vertical and horizontal derivatives are thenreadily estimated by calculating the inner product betweenthe weight vector and a vector that contains the value ofthe point at each basis. Once this is done the final imageis generated by Poisson integration of estimated vertical andhorizontal derivatives [1].

VI. RE-ILLUMINATION

The re-illumination phase is nearly identical to the de-illumination stage. The main difference is that the goal haschanged from calculating the de-illuminated face to calcu-lating new illuminations. In a simplified image formationmodel, images are the product of the de-illuminated face andan illumination image. This makes the illumination imagesimilar to the shading images from [16], [15]. This imagehas also been referred as the quotient image [13].The only other difference from the de-illumination stage

is that the input images are de-illuminated faces. Besidesthese two differences, the illumination estimation involvesthe same basic steps of estimating derivative values andintegrating them to form re-illuminated images.

Fig. 2. Example re-illumination training data for the Small Faces. Thefar left column is the uniforn-y illuminated source training data. Eachremaining column represents the quotient image source training set fora particular illumination model. In this case: illumination from the right;illumination from the top; illumination from the left; illumination from thebottom.

VII. RESULTS

Extensive experiments are run on images of faces gener-ated from the "Extended Yale Face Database B" [8], withonly some depicted herein due to limited space. Additionalresults are shown in supplementary material. Images fromthis database were cropped and only roughly aligned to createtwo data sets: Small Faces and Large Faces. The SmallFace Data consisted of images of faces that were scaled to80 x 100 pixels in size and included hair and some residualbackground information. The Large Face data consisted ofimages of faces that were cropped to contain just the centerportion of the face. These cropped images were 136 x 70.Having small and large faces enabled us to see how scaleaffects the approach.

Both sets of image data were then partitioned into fourillumination sets: illumination from the right; illuminationfrom the top; illumination from the left; illumination fromthe bottom. Figure 1 shows an example of the source trainingdata and target training data for the De-illumination of theSmall Faces. Figure 2 shows an example of the sourcetraining data and target training data for the Re-illuminationof the Small Faces.The data for both Small and Large Faces was separated

into two sets, a training set and a testing set. The training setfor the Small Faces consisted of fifteen faces. The remainingimages in the database that were not part of the trainingset were used as testing set. Similarly, the training set forthe Large Faces consisted of twenty one faces, with theremaining images in the database used as the testing set.

The results presented show two de-illumination / re-

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 5, 2009 at 22:18 from IEEE Xplore. Restrictions apply.

Page 6: Face Appearance Different Lighting ConditionsMtappen/Pubs/Ieee_btas_2008.pdfLearning Face Appearance under Different Lighting Conditions Brendan Moore, Marshall Tappen, Hassan Foroosh

illumination scenarios. The first being the situation wherede-illumination is not required. This can be considered theoptimal scenario and would apply to images that are alreadyilluminated by a relatively uniform light source, such as thosetaken outdoors. These results are shown in Figure 4

In the second scenario, shown in Figures 3 (Large images)and 5 (small images), the harder task of de-illumination mustbe performed prior to re-illumination. The results presentedfor this scenario show how the method performs despitebeing given harsh illuminations in the test images. Theseharsh conditions make the image recovery process far morechallenging due to the fact that large portions of the inputimage to be de-illuminated are unknown due to shadows.

VIII. OBSERVATIONS AND CONCLUDING REMARKS

By inspecting the results in Figures 4, and 5, we can makethe following observations and remarks.Re-Illuminating is Easier than De-Illuminating: The im-ages that were only re-illuminated, rather than being de-illuminated and re-illuminated are, in general, higher qual-ity. This indicates that much of the degradation in thede-illumination/re-illumination steps enters during the de-illumination step, which almost all existing methods avoidby either using special overly expensive equipment (e.g. lightstage) or using a huge number of images of the same objectilluminated from many possible directions.The System Must Hallucinate Data: Examining the leftside of the faces, it is clear that the method is hallucinatingdetails that are in the shadow. While the method does a goodjob, more research is needed on the type of cost functionsthat will allow the method to better represents these details.Again, note that this problem is not tackled by other existingmethods, since they either assume perfect alignment andsame pose, or use special equipment and setups.To conclude, we have presented a machine learning ap-

proach that takes advantage of the statistical regularities fora class of illuminated objects in images to both de-illuminateand re-illuminate. The underlying idea in this model is thatby breaking down the image into small patches it is possibleto learn estimators for specific parts of the face. This leadsto a system that is able to produce realistic results and yetstraightforward to implement on a desktop without specialequipment or huge number of illumination directions perinput training object.

REFERENCES

[1] A. Agrawal, R. Raskar, and R. Chellappa. What is the range ofsurface reconstructions from a gradient field? European Conferenceon Computer Vision, 1:578-591, 2006.

[2] R. E. Bellman. Adaptive Control Processes. Princeton UniversityPress, 1961.

[3] C. M. Bishop. Pattern Recognition and Machine Learning. Springer,New York, NY, 2006.

[4] V. Blanz, S. Romdhani, and T. Vetter. Face identification acrossdifferent poses and illuminations with a 3d morphable model. In Proc.Fifth IEEE International Conference on Automatic Face and GestureRecognition, 20-21:192-197, 2002.

[5] R. Brunelli and T. Poggio. Face recognition: features versus templates.IEEE Transactions on Pattern Analysis and Machine Intelligence,15(10):1042-1052, 1993.

[6] H. F. Chen, P. N. Belhumeur, and D. Jacobs. In search of illumi-nation invariants. IEEE Conference on Computer Vision and PatternRecognition, Proceedings., 1:254-261, 2000.

[7] P. Debevec, T. Hawkins, C. Tchou, H.-P. Duiker, W. Sarokin, andM. Sagar. Acquiring the reflectance field of a human face. InK. Akeley, editor, Siggraph 2000, Computer Graphics Proceedings,pages 145-156. ACM Press / ACM SIGGRAPH / Addison WesleyLongman, 2000.

[8] A. Georghiades, P. N. Belhumeur, and D. Kriegman. From few tomany: Illumination cone models for face recognition under variablelighting and pose. IEEE Transactions on Pattern Analysis and MachineIntelligence, 23(6):643-660, 2001.

[9] R. Gross, S. Baker, I. Matthews, and T. Kanade. Face recognitionacross pose and illumination. In S. Z. Li and A. K. Jain, editors,Handbook of Face Recognition. Springer-Verlag, June 2004.

[10] R. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, andA. Wu. An efficient k-means clustering algorithm: analysis andimplementation. IEEE Transactions on Pattern Analysis and MachineIntelligence, 24(7):881-892, 2002.

[11] K. Lee and B. Moghaddam. A practical face relighting method fordirectional lighting normalization. IEEE International Workshop onAnalysis and Modeling of Faces and Gestures, 2005.

[12] R. Ramamoorthi. Modeling illumination variation with sphericalharmonics. Academic Press, Burlington, MA, 2006.

[13] A. Shashua and T. Riklin-Raviv. The quotient image: class-basedre-rendering and recognition with varying illuminations. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 23(2):129-139,2001.

[14] A. Stoschek. Image-based re-rendering of faces for continuous poseand illumination directions. IEEE Conference on Computer Vision andPattern Recognition, 1:582-587, 2000.

[15] M. F. Tappen, E. H. Adelson, and W. T. Freeman. Estimating intrinsiccomponent images using non-linear regression. IEEE Conference onComputer Vision and Pattern Recognition, 2:1992-1999, 2006.

[16] M. F. Tappen, W. T. Freeman, and E. H. Adelson. Recovering intrinsicimages from a single image. IEEE Transactions on Pattern Analysisand Machine Intelligence, 27(9):14591472, 2005.

Source De-illuminated Face

Re-illuminated Faces

Fig. 3. Example of the De-illumination and Re-illumination process forthe Large Dataset

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 5, 2009 at 22:18 from IEEE Xplore. Restrictions apply.

Page 7: Face Appearance Different Lighting ConditionsMtappen/Pubs/Ieee_btas_2008.pdfLearning Face Appearance under Different Lighting Conditions Brendan Moore, Marshall Tappen, Hassan Foroosh

Re-illuminated Faces

Fig. 4. Example of the Re-illumination process.

Source

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 5, 2009 at 22:18 from IEEE Xplore. Restrictions apply.

Page 8: Face Appearance Different Lighting ConditionsMtappen/Pubs/Ieee_btas_2008.pdfLearning Face Appearance under Different Lighting Conditions Brendan Moore, Marshall Tappen, Hassan Foroosh

De-illuminated Face

Fig. 5. Example of the De-illumination and Re-illumination process.The source image on the left is used to first produce an image of face under uniformillumination. We refer to this face as the de-illuminated face. The de-illuminated face is combined with estimated illuminations to synthesize images ofthe face under a number of different illuminations.

Source Re-illuminated Faces

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 5, 2009 at 22:18 from IEEE Xplore. Restrictions apply.