human face reconstruction using bayesian deformable modelseprints.qut.edu.au/9407/1/9407.pdf ·...

6
Human Face Reconstruction using Bayesian Deformable Models George Mamic, Clinton Fookes and Sridha Sridharan Image and Video Research Laboratory Queensland University of Technology GPO Box 2434, 2 George Street, Brisbane, Queensland 4001 {g.mamic,c.fookes,s.sridharan}@qut.edu.au Abstract This paper presents a Bayesian framework for 3D fa- cial reconstruction. The framework iteratively deforms a generic face mesh to fit a set of range points represent- ing a face. The generic mesh is generated from the ex- tensive FRGC database of face images. The deformation process is conducted within a Bayesian framework and is driven by a Markov Chain Monte Carlo (MCMC) sampler which uses information from the likelihood and prior dis- tributions of the generic face mesh. The paper presents re- sults on the construction of a generic face model, the defor- mation framework and fitting results to both synthetic and real data. The results verify the effectiveness of the pro- posed technique, accurately deforming a generic face mesh to captured 3D data points of human faces. 1. Introduction Today’s smart security solutions often employ the use of biometrics, such as fingerprints or iris scans, as an in- tegral part in the overall system. This is largely due to the increased demand on modern society to identify or authenticate an individual with much stronger certainty. The uniqueness of face recognition technology that distin- guishes it from the other cohort of biometrics is that im- ages of the face can be captured quickly and in the majority of situations non-intrusively. They also have the potential to operate in noisy crowded environments where other bio- metrics may fail. Unfortunately, traditional 2D methods for face verification have struggled to cope with changes in il- lumination and pose. This has led researchers to the use of three-dimensional (3D) facial data in order to overcome these difficulties [4]. One of the key components of 3D face recognition is the ability to reconstruct the 3D human face and estimate rel- evant models from the captured sensor data. This step is not only important for face recognition, but also for other computer graphics and machine vision fields including en- tertainment, medical image analysis and human-computer interaction applications [15]. The approaches to solving this difficult problem, however, have varied considerably. This paper presents a Bayesian framework for 3D facial reconstruction. The system iteratively deforms a generic face mesh to fit a set of 3D points representing range data obtained for face fitting. The proposed system is generic and does not depend on any particular sensor to capture the 3D data. The a-priori information in the Bayesian frame- work is guided by information obtained from the construc- tion of the generic face model. This model is constructed from the “Face Recognition Grand Challenge” (FRGC) database [13]. The deformation process aims to maximise the joint posterior distribution by a Markov Chain Monte Carlo (MCMC) sampler which uses information from the likelihood and prior distributions of the generic face mesh. An overview of the proposed method is illustrated in Figure 1 within a face recognition context. This paper, however, is only concerned with the construction of the generic face mesh and the accurate fitting. This paper is organised as follows. Section 2 presents some existing work in the field and how it relates to the proposed method. Section 3 presents the strategy used in the construction of the generic face mesh. Section 4 details the Bayesian framework used in the fitting of the generic face mesh to scanned 3D face data points. Section 5 presents the results of the construction of the generic 3D face mesh as well as accuracy of the fitting procedure. Finally the paper is concluded with a discussion in Section 6. 2. Related Work There are numerous methods to capture 3D information of a human face. Methods include a range of both pas- sive and active sensors such as stereo [5, 8, 11], shape from shading [6], structured light [15], shape from silhouettes, Proceedings of the IEEE International Conference on Video and Signal Based Surveillance (AVSS'06) 0-7695-2688-8/06 $20.00 © 2006

Upload: others

Post on 04-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Human Face Reconstruction Using Bayesian Deformable Modelseprints.qut.edu.au/9407/1/9407.pdf · face verification have struggled to cope with changes in il-lumination and pose. This

Human Face Reconstruction using Bayesian Deformable Models

George Mamic, Clinton Fookes and Sridha SridharanImage and Video Research LaboratoryQueensland University of Technology

GPO Box 2434, 2 George Street,Brisbane, Queensland 4001

{g.mamic,c.fookes,s.sridharan}@qut.edu.au

Abstract

This paper presents a Bayesian framework for 3D fa-cial reconstruction. The framework iteratively deforms ageneric face mesh to fit a set of range points represent-ing a face. The generic mesh is generated from the ex-tensive FRGC database of face images. The deformationprocess is conducted within a Bayesian framework and isdriven by a Markov Chain Monte Carlo (MCMC) samplerwhich uses information from the likelihood and prior dis-tributions of the generic face mesh. The paper presents re-sults on the construction of a generic face model, the defor-mation framework and fitting results to both synthetic andreal data. The results verify the effectiveness of the pro-posed technique, accurately deforming a generic face meshto captured 3D data points of human faces.

1. Introduction

Today’s smart security solutions often employ the useof biometrics, such as fingerprints or iris scans, as an in-tegral part in the overall system. This is largely due tothe increased demand on modern society to identify orauthenticate an individual with much stronger certainty.The uniqueness of face recognition technology that distin-guishes it from the other cohort of biometrics is that im-ages of the face can be captured quickly and in the majorityof situations non-intrusively. They also have the potentialto operate in noisy crowded environments where other bio-metrics may fail. Unfortunately, traditional 2D methods forface verification have struggled to cope with changes in il-lumination and pose. This has led researchers to the useof three-dimensional (3D) facial data in order to overcomethese difficulties [4].

One of the key components of 3D face recognition is theability to reconstruct the 3D human face and estimate rel-evant models from the captured sensor data. This step is

not only important for face recognition, but also for othercomputer graphics and machine vision fields including en-tertainment, medical image analysis and human-computerinteraction applications [15]. The approaches to solving thisdifficult problem, however, have varied considerably.

This paper presents a Bayesian framework for 3D facialreconstruction. The system iteratively deforms a genericface mesh to fit a set of 3D points representing range dataobtained for face fitting. The proposed system is genericand does not depend on any particular sensor to capture the3D data. The a-priori information in the Bayesian frame-work is guided by information obtained from the construc-tion of the generic face model. This model is constructedfrom the “Face Recognition Grand Challenge” (FRGC)database [13]. The deformation process aims to maximisethe joint posterior distribution by a Markov Chain MonteCarlo (MCMC) sampler which uses information from thelikelihood and prior distributions of the generic face mesh.An overview of the proposed method is illustrated in Figure1 within a face recognition context. This paper, however,is only concerned with the construction of the generic facemesh and the accurate fitting.

This paper is organised as follows. Section 2 presentssome existing work in the field and how it relates to theproposed method. Section 3 presents the strategy used in theconstruction of the generic face mesh. Section 4 details theBayesian framework used in the fitting of the generic facemesh to scanned 3D face data points. Section 5 presents theresults of the construction of the generic 3D face mesh aswell as accuracy of the fitting procedure. Finally the paperis concluded with a discussion in Section 6.

2. Related Work

There are numerous methods to capture 3D informationof a human face. Methods include a range of both pas-sive and active sensors such as stereo [5, 8, 11], shape fromshading [6], structured light [15], shape from silhouettes,

Proceedings of the IEEE International Conferenceon Video and Signal Based Surveillance (AVSS'06)0-7695-2688-8/06 $20.00 © 2006

Page 2: Human Face Reconstruction Using Bayesian Deformable Modelseprints.qut.edu.au/9407/1/9407.pdf · face verification have struggled to cope with changes in il-lumination and pose. This

Structured Light, Stereo,Aquisition Sensors:

Laser Scanners,CCD, CMOS, etc.

FRGCDatabase

3D FeatureExtraction

MCMC MeshOptimisation

2D FeatureExtraction

Generic Fash Mesh

Classifier

3D Range Data

Figure 1. Overview of the Bayesian deformation algorithm in a recognition context.

laser scanners [13], or structure from motion methods [10].Some of these methods obtain the 3D information directlywhile others depend on methods to reliably determine cor-respondences between images containing human faces. Toperform the latter approach in an automatic fashion is ex-tremely problematic. The majority of these problems stemdirectly from the human face itself, as faces are non-planar,non-rigid, non-lambertain and are subject to self occlusion[2]. Due to these issues and the uniform appearance oflarge portions of the human face, passive techniques such asstereo can not always accurately determine 3D face shape.Active sensors, although they can produce higher resolutionand more accurate data, achieve this with more expensiveequipment and at the cost of projecting light or other visiblepatterns onto the subjects face.

Many passive techniques depend on extensive manualguidance to select specific feature points or determine cor-respondences between images [14]. Fua and Miccio [8] usea least-squares optimisation to fit a head animation modelto stereo data. Although powerful, their method requiresthe manual selection of key 2D feature points and silhou-ettes in the stereo images. Ypsilos et al. [15] also fit ananimation face model to stereo data. However, the stereodata is combined with a projected infrared structured lightpattern allowing the simultaneous capture of 3D shape in-formation as well as colour information. Ansari et al. [1] fita face model to stereo data via a Procrustes analysis to glob-ally optimise the fit, then a local deformation is performedto refine the fitting. Some methods use optical flow [7] orappearance-based techniques [10] to overcome the lack oftexture in many regions in the face. These approaches, how-ever, are often hampered by small deformations betweensuccessive images and issues associated with non-uniformillumination.

Differential techniques attempt to characterise the salientfacial features such as the nose, the mouth and the orbitsusing extracted features such as crest lines and curvature

information. Lengagne et al. [12] employ a constrainedmesh optimisation based on energy minimisation. The en-ergy function is a combination of two terms, an externalterm, which fits the model to the data and an internal term,which is a regularisation term. The difficulty with differ-ential techniques is in controlling the optimisation as noisyimages can wreak havoc with estimating differential quanti-ties, whilst regularisation terms have the potential to smoothout the very features that are most important to recognitionalgorithms.

Statistical techniques fit morphable models of 3D facesto images using information that is gathered during the con-struction of the generic face mesh. Blanz and Vetter [3]present a method for face recognition, by simulating theprocess of image formation in 3D space, fitting a morphablemodel of 3D faces to images and a framework for face iden-tification based upon model parameters defined by 3D shapeand texture. The developed system is a complete imageanalysis system provided that manual interaction in the def-inition of feature points is permissible.

The approach presented in this paper does not rely onany manual intervention to select feature points or corre-spondences. It is also a generic framework in that it can beused to fit a face model to any acquired sensor data. TheBayesian framework allows the easy incorporation of priorinformation (in the form of a generic face mesh) to be in-corporated into the fitting process and the MCMC samplertakes advantage of the likelihood and prior distributions ofthe generic face mesh to accurately fit the mesh to rangepoints representing a face.

3. Constructing a Generic Face Mesh

The generic face mesh is a model that is deformed to fitthe acquired 3D data of any person’s face. This mesh wasconstructed from the extensive FRGC database [13] and isgenerated from the average of a randomly selected set of

Proceedings of the IEEE International Conferenceon Video and Signal Based Surveillance (AVSS'06)0-7695-2688-8/06 $20.00 © 2006

Page 3: Human Face Reconstruction Using Bayesian Deformable Modelseprints.qut.edu.au/9407/1/9407.pdf · face verification have struggled to cope with changes in il-lumination and pose. This

aligned and pre-processed images. A sufficient number offaces were averaged in the generation of the generic facemesh such that there was insignificant change in the facemodel when another face was added. Section 5.1 will out-line this procedure in more detail and show the results ofthe generic face mesh.

4. Bayesian Framework

The deformation of the generic face mesh constructedin the previous section was done within a Bayesian Frame-work. The framework was constructed such that given a setof range points R (which are the range data points of a face)a triangular mesh defined by the vertices V is deformed soas to provide an optimal fit to the data points. Mathemati-cally this can be posed in the following manner,

p(V |R) ∝ p(R|V )p(V ), (1)

where p(V |R) is the posterior distribution, p(R|V ) is thelikelihood, and p(V ) is the prior distribution. Thus, to esti-mate a fit of the generic face mesh to the given range dataR, the process becomes an optimisation taking the follow-ing form,

maxV

(p(V |R)) ∝ maxV

(p(R|V )p(V )) . (2)

In order to solve this optimisation, the form of the likeli-hood p(R|V ) and the prior distribution p(V ) must be deter-mined.

4.1. Likelihood Distribution

Under the assumption that the difference between thedata to be fitted and the the generic face mesh is zero meanGaussian distributed and conditionally independent, it ispossible to derive the following form for the likelihood dis-tribution,

p(R|V ) ∝∏

i

exp(−∑

(Ri − Vi)2

2σ2

). (3)

The variance term σ2 is a nuisance variable and is re-moved by assigning the conjugate inverse Gaussian distri-bution and reparameterising such that,

φ = σ2 ∼ G(a, b), (4)

where G is the Gamma distribution. Integration of the like-lihood term to remove the nuisance variable yields the fol-lowing form,

l(R|V ) ∝[

12

∑i

(Ri − Vi)2 + b

]( i2−a)

. (5)

4.2. Prior Distribution

Derivation of the prior distribution is done by using in-formation available from the process that is employed in theconstruction of the generic face mesh. Namely, since themodel is constructed via the averaging of a large number offaces, it is possible to deduce from the central limit theoremthat regardless of the distribution of the original faces, theaveraged face will tend toward a Gaussian distribution.

Hence, the following form may be adopted for the priordistribution,

p(V ) ∝∏

i

exp

[−1∑

(Vi − µ)2

2σ2

], (6)

where the terms µ and σ are the mean and standard devia-tion of the generic face mesh.

4.3. Optimisation of the framework

Bayesian inference on the parameters of interest, V inthis situation, can be made based upon the joint posteriordistribution p(V |R) which was derived in Section 4. Withthe dimensionality of the system fixed, this problem canbe resolved using the Metropolis Hastings (MH) algorithm.This algorithm is well documented in the literature [9] andonly a brief overview of the algorithm will be provided here.

1. Randomly choose a vertex to move V .

2. Propose a change to a new Vertex height of V ′ from aproposal distribution q(v′|V ).

3. Calculate the acceptance ratio,

α(V ′nVn) ∝ min

[1,

p(R|V ′n)q(Vn|V ′

n)p(R|Vn)q(V ′

n|Vn)

]. (7)

4. Repeat until convergence is achieved.

Although straightforward in implementation, it is worthnoting some subtleties that are present in the implementa-tion of this optimisation strategy. Firstly, the choice of theproposal distribution influences the rate at which the genericface mesh fits the data at hand. Over a defined set of sam-ples, a uniform distribution whose bounds are set outsidethe maximum and minimum data values, will fit large vari-ations in the data more readily then will a Gaussian (withan appropriate standard deviation), which is centred on themean of the data. Hence, the choice of the proposal dis-tribution is somewhat application specific, depending onwhether accuracy of fit, or smoothness of data in noisy en-vironments is required.

With very large data sets, the manual calculation andsimplification of the acceptance ratio was also found to be

Proceedings of the IEEE International Conferenceon Video and Signal Based Surveillance (AVSS'06)0-7695-2688-8/06 $20.00 © 2006

Page 4: Human Face Reconstruction Using Bayesian Deformable Modelseprints.qut.edu.au/9407/1/9407.pdf · face verification have struggled to cope with changes in il-lumination and pose. This

somewhat beneficial, particularly with very large data sets.As the initial stage of fitting can be hampered with largenumbers which current computing facilities struggle to han-dle, it was discovered that manual simplification of the ac-ceptance ratio removed this issue.

5. Results and Discussion

This section is composed of three sub-sections. The firstsubsection examines the results that were achieved with theconstruction of the generic face mesh. The second subsec-tion illustrates the use of the MCMC optimisation algorithmon the fitting of a plane to a synthetic data set. The final sub-section presents results from the fitting of the generic facemesh to a set of range data points captured from a humanface.

5.1. Generating the Generic Face Mesh

The generic face mesh was constructed using a randomsample of 45 faces from the FRGC database of range im-ages of human faces. These images had been pre-processedand brought into coarse registration by using informationon the location of the eyes, nose and chin. The resultinggeneric face mesh that was used in subsequent testing ispresented in Figure 2.

0

20

40

60

80

100

120

140

0

20

40

60

80

100

120

140

−10

−5

0

5

10

15

20

25

30

35

Average Face from Database

Figure 2. Generic Face Mesh generated froma random sample population (45 faces) of theFRGC database.

The number of faces that were used in the generationof the generic face mesh was determined by examining therate of change of the resultant average face at the end ofeach iteration as displayed in Figure 3.

0 5 10 15 20 25 30 354

5

6

7

8

9

10

11

12

Number of Faces Averaged

log(

erro

r2 ) be

twee

n av

erag

ed fa

ces

log(error2) ’vs’ Number of Faces Averaged

log(error2)

Figure 3. Plot of the error versus the numberof faces averaged in the generic face mesh.

As can be seen from Figure 3, the error decreased sub-stantially as more faces were incorporated into the genericface mesh and eventually reached a stable level.

5.2. Synthetic Data

The mesh deformation algorithm was initially tested bydeforming a plane to fit a synthetic data set. The test datachosen was the peaks data set from Matlab c©. Figure 4shows the initial plane which is to be deformed and thepeaks data set.

−15 −10 −5 0 5 10 15−20

0

20

−6

−4

−2

0

2

4

6

X Data

Initial Seed Points to be fitted to Peaks Data Set Mesh

Y Data

Hei

ght

Figure 4. Testing of the Bayesian Deforma-tion framework on Synthetic data. Plot showsthe initial plane and the Matlab c©peaks dataset.

Proceedings of the IEEE International Conferenceon Video and Signal Based Surveillance (AVSS'06)0-7695-2688-8/06 $20.00 © 2006

Page 5: Human Face Reconstruction Using Bayesian Deformable Modelseprints.qut.edu.au/9407/1/9407.pdf · face verification have struggled to cope with changes in il-lumination and pose. This

After one hundred iterations of the MH algorithm, thegraph in Figure 5 quantifies the error that existed betweenthe original data mesh and the peaks data set.

0 20 40 60 80 100 1202

3

4

5

6

7

8

9

10

Number of Iterations (000’s)

Log(

Fitt

ing

Err

or)

Log(Fitting Error) ’vs’ Number of Iterations

Figure 5. Plot of fitting error versus the num-ber of iterations of the MCMC Sampler for thepeaks data set.

As shown by the figure there is a substantial decline inthe error of the fit, particularly when one considers that theleft axis is in fact the log of the fitting error. From thisencouraging result more complicated fitting tasks were at-tempted in the next section.

5.3. Real Data

The algorithm was tested on real range images of hu-man faces. Using the generic face mesh as the seed mesh,the data points were then fitted through deformation of thegeneric face mesh. An example of the initial stage of thealgorithm is provided in Figure 6.

Application of the proposed Bayesian deformation algo-rithm resulted in an accurate fit of the generic face meshto the captured 3D data. A fit that is typical of the resultsachieved is displayed in Figure 7.

Figure 8 displays the error behaviour in the fitting as afunction of the number of iterations of the algorithm. Thisprocess was repeated across four test subjects.

As can be seen in Figure 8, even across multiple facesthe algorithm behaves in a relatively consistent manner inthe fitting of the faces.

6. Conclusion and Future Work

This paper presented a Bayesian framework for 3D fa-cial reconstruction which iteratively deforms a generic facemesh to fit a set of 3D points representing range data ob-tained for face fitting. The results verify the effectiveness of

0

5

10

15

20

25

30

35

0

5

10

15

20

25

30

35

−10

−5

0

5

10

15

20

25

30

35

Original Average Mesh − Fitted Data Surface

Figure 6. Generic Face Mesh plotted againstcaptured 3D data of a human face.

0

5

10

15

20

25

30

35

0

5

10

15

20

25

30

35

−10

−5

0

5

10

15

20

25

30

35

Fitted Data Surface

Figure 7. Result after Bayesian deformationof generic face mesh to the raw 3D data.

Proceedings of the IEEE International Conferenceon Video and Signal Based Surveillance (AVSS'06)0-7695-2688-8/06 $20.00 © 2006

Page 6: Human Face Reconstruction Using Bayesian Deformable Modelseprints.qut.edu.au/9407/1/9407.pdf · face verification have struggled to cope with changes in il-lumination and pose. This

0 100 200 300 400 500 6000

2

4

6

8

10

12log(Fitting Error) ’vs’ Iterations (000’s)

Iterations (000’s)

log(

Fitt

ing

Err

or)

Sample 1Sample 2Sample 3Sample 4

Figure 8. Plot showing error behaviour of theMCMC Sampler. Error is shown to radicallyreduce as the number of iterations increases.This is repeated across 4 test subject face’s,each case showing very similar behaviour.

the MCMC Sampler to maximise the posterior distributionof the vertices of the model given the acquired 3D rangepoints for a range of faces. Future work will apply thisframework to data acquired from a range of different acqui-sition devices including stereo which is more noisy and dif-ficult to fit. Future work will also investigate the use of otherprior distributions and using the Bayesian framework to op-timise the location of vertices in the fitted model, rather thanrelying on the location of regular vertices. This will resultin a higher density of vertices in areas of large data varia-tion and visa versa, which is also specific for the individual.The face reconstruction algorithm will also be incorporatedinto a complete 3D face recognition system, as outlined inFigure 1.

References

[1] A.-N. Ansari and M. Abdel-Mottaleb. 3d face modellingusing two views and a generic face model with application to3d face recognition. In Proceedings of the IEEE Conferenceon Advanced Video and Signal Based Surveillance, 2003.

[2] S. Baker and T. Kanade. Super-resolution optical flow.Technical Report CMU-RI-TR-99-36, Robotics Institute,Carnegie Mellon University, October 1999.

[3] V. Blanz and T. Vetter. Face recognition based on fitting a3d morphable model. IEEE Transactions on Pattern Analy-sis and Machine Intelligence, 25(9):1063–1074, September2003.

[4] K. C. K. W. Bowyer and P. J. Flynn. A survey of ap-proaches to three-dimensional face recognition. Technicalreport, University of Notre Dame, 2003.

[5] Q. Chen and G. Medioni. Building 3d human face modelsfrom two photographs. Journal of VLSI Signal Processing,27:127–140, 2001.

[6] K. Choi, P. Worthington, and E. Hancock. Estimating facialpose using shape-from-shading. Pattern Recognition Let-ters, 23(5):533–548, March 2002.

[7] D. DeCarlo and D. Metaxas. Deformable model-basedshape and motion analysis from images using motion resid-ual error. In International Conference on Computer Vision,pages 113–119, India, 1998.

[8] P. Fua and C. Miccio. Animated heads from ordinary im-ages: A least-squares approach. Computer Vision and ImageUnderstanding, 75(3):247–259, 1999.

[9] W. K. Hastings. Monte carlo sampling methods usingmarkov chains and their application. Biometrika, 57:97–109, 1970.

[10] S. Kang. A structure from motion approach using con-strained deformable models and apperance prediction. Tech-nical Report CRL 97/6, Cambridge Research Laboratory,October 1997.

[11] C. Kuo, T.-G. Lin, R.-S. Huang, and S. Odeh. Facial modelestimation from stereo/mono image sequence. IEEE Trans-actions on Multimedia, 5(1):8–23, March 2003.

[12] R. Lengagne, P. Fua, and O. Monga. 3d stereo reconstructionof human faces driven by differential constraints. Image andVision Computing, 18:337–343, 2000.

[13] J. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang,K. Hoffman, J. Marques, J. Min, and W. Worek. Overviewof the face recognition grand challenge. In Proceedings ofIEEE Conference of Computer Vision and Pattern Recogni-tion, volume 1, pages 947–954, 2005.

[14] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, andD. Salesin. Synthesizing realistic facial expressions fromphotographs. In Proceedings of SIGGRAPH ComputerGraphics, volume 26, pages 75–84, 1998.

[15] I. Ypsilos, A. Hilton, and S. Rowe. Video-rate capture ofdynamic face shape and appearance. In Proceedings of theSixth IEEE International Conference on Automatic Face andGesture Recognition, 2004.

Proceedings of the IEEE International Conferenceon Video and Signal Based Surveillance (AVSS'06)0-7695-2688-8/06 $20.00 © 2006