2005 - adaptive rigid multi-region selection for handling expression variation in 3d face...

7/27/2019 2005 - Adaptive Rigid Multi-Region Selection for Handling Expression Variation in 3D Face Recognition

1/8

Adaptive Rigid Multi-region Selection for Handling Expression Variation in 3D

Face Recognition

Kyong I. Chang Kevin W. Bowyer Patrick J. FlynnComputer Science & Engineering Department

University of Notre Dame Notre Dame, IN 46556

Abstract

We present a new algorithm for 3D face recognition, and

compare its performance to that of previous approaches.

We focus especially on the case of facial expression change

between gallery and probe images. We first establish per-

formance comparisons using a PCA (eigenface) algo-

rithm and an ICP (iterative closest point) algorithm simi-lar to ones reported in the literature. Experimental results

show that the performance of either approach degrades sub-

stantially in the case Then we introduce a new algorithm,

AdaptiveRigidMulti-regionSelection, is introduced to in-

dependently matches multiple facial regions and creates a

fused result. This algorithm is fully automated and used no

manually selected landmark points. Experimental results

show that our new algorithm substantially improves perfor-

mance in the case of varying facial expression. Our exper-

imental results are based on the largest 3D face dataset to

date, with 449 persons, over 4,000 3D images, and substan-

tial lapse between gallery and probe images.

1. Introduction

Recently, interest in 3D face recognition has grown and a

great deal of research effort has been devoted to biometric

sources represented in 3D (e.g. face, hand geometry, ear).

There is a commonly accepted claim that face recognition

in 3D is superior to 2D due to the invariance of 3D sensors

to illumination, facial make-up and pose [1, 2, 3]. This is

mainly because 3D sensors acquire data based on the shape

of objects in the scene instead of light reflected from the

scene. The additional dimension makes a 2.5D shape avail-

able, however it also requires methods to process such data

in a reasonable and efficient way. This might be trivial indomains dealing with artificial objects in laboratory light-

ing, but it remains to be demonstrated that it can be robust

and accurate in typical person identification environments.

Other benefits of shape information in the context of face

recognition include that the shape of the human face does

not change as much as the appearance over time.

A recent study by Givens et al.[4] reported that the hard-

est factors among others in face recognition are expres-

sion changes, eye lid open / closed and mouth open /

closed. All of these are related to facial expression to a

certain extent. The results also coincide with our previous

work [5]. In that study, we found that different expressions

between the gallery and probe sets degrade rank-one recog-

nition rates in 2D face by as much as 15%. Also, a similar

study performed for 3D face recognition shows that perfor-

mance degrades by as much as 33% [6].

One of the conclusions reported by the Face Recognition

Vendor Test 2002 [7] is that the number of subjects in the

database and time-lapse between gallery and probe affects

the overall performance rates: For identification and watch

list tasks, performance decreases linearly in the logarithm

of the database size [7]. Average performance decrease

for 2D face recognition is 15% in identification when the

time lapse between gallery and probe reaches around 500

days. Note thattime lapseimplies more than the temporal

aspect. In other words, pose, facial make up, facial hair,

and/or lighting condition are factors associated with time

lapse in the evaluation. These arguments raise a problem

with currently reported studies in 3D, since most of the 3D

approaches reviewed in [6] considered only neutral expres-

sions with a limited number of subjects and time-variations.

This study pursues the idea of there being some subset of

the face that is relatively rigid between two expressions, and

using multiple regions to allow flexibility across different

expressions. There are at least three general methods that

one might employ in an attempt to handle the problem of

varying facial expression. One approach would be to sim-

ply concentrate on regions of the face whose shape changes

the least with varying facial expression. For example, one

might simply ignore the lips and mouth region, since the

shape varies greatly with expression. Of course, there isno large subset of the face that is perfectly shape invariant

across a broad range of normal expressions, and so this ap-

proach will not be perfect. Another approach would be to

enroll a person into the gallery by intentionally sampling

a set of different facial expressions, and to match a probe

against the set of shapes representing a person. This ap-

proach requires some cooperation on the part of the subject

in order to obtain the set of different facial expressions. This

1

oceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05)063-6919/05 $20.00 2005 IEEE


2/8

approach also runs into the problem that, however large the

set of facial expressions sampled for enrollment, the probe

shape may represent an expression other than those sam-

pled. Thus this approach also does not seem to allow the

possibility of a perfect solution. A third approach would be

to have a general model of 3D facial expression that can be

applied to any persons image(s). The search for a matchbetween a gallery and a probe shape could then be done

over the set of parameters controlling the instantiation of

expression-specific shape. This approach seems destined to

also run into problems. There likely is no general model

to predict, for example, how each persons neutral expres-

sion image is transformed into their smiling image. A smile

means different things to different persons facial shapes,

and different things to the same person at different times

and in different cultural contexts. Given that there does not

seem to be any single correct approach, the question is

which approach or combination of approaches can be used

to achieve the desired level of performance.

In this study, we first document the extent to which facialexpression change degrades performance in 3D face recog-

nition. Then, we address this problem by considering 3D

face matching only in localized facial regions that show rel-

atively less variation across expressions. Such regions are

detected automatically using 3D geometrical features, un-

like other facial finding methods using both 2D color and

3D depth images acquired at the same time [8].

1.1. Previous Work

There appear to be three main categories of approach to 3D

face recognition in the literature. A 3D face can be thought

of as a group of points defined in 3D space and they can bematched using a registration technique [3, 9, 10]. Also, the

eigenface approach can be extended to accomplish recogni-

tion by measuring the depth (shape) variations observed in

range images [11, 1, 12, 13, 14]. Finally, there is a group of

studies that use a set of features computed from 3D geome-

try of face to measure the similarity [15, 16, 17, 18, 19].

Even though a handful of 3D face recognition studies

[20, 13, 16] consider expression variations, there is no rig-

orous evaluation study that explicitly addresses the facial

expression problems on a large dataset.

1.2. Initial Study for the Baseline PerformanceAn experiment is conducted to establish a baseline perfor-

mance obtained by using a whole face. In one approach,

a whole face is cropped using manually selected points on

two outer eye tips and a nose tip (See Fig.1). Similarity be-

tween gallery and probe surfaces is measured using the ICP

surface registration technique.

This approach is compared to the PCA-based approach

to show the difference in recognition accuracy between the

two different approaches.

ICP-baseline PCA-baseline

Figure 1: Sample images (a gallery on the left and a probe on the rightin each column) used for the baseline performance. Nearly the entire face

region is considered for both methods. For the ICP-baseline, the probe face

is approximately 10% smaller coverage area than in the gallery set.

The results shown in Table 1 are obtained with an

ICP algorithm (denoted as ICP-baseline) matching the

whole frontal face region, using manually selected land-

mark points for the initial rotation and translation estimate

given to the ICP algorithm. Successive probe sets havelonger elapsed time between acquisition of the gallery im-

age and the probe image. The same gallery images are used

with all probe sets, and all gallery images have neutral ex-

pression. There is significant performance drop when ex-

pression varies between gallery and probe, from an average

of 91.0% down to an average of 61.5%. This clearly shows

the limitation in rigid registration of deformed surfaces.

Table 1: Performance Degradation with Expression Change

Probe sets with Neutral expression (rates in %)

#1 #2 #3 #4 #5 #6 #7 #8 #9

92.2 87.7 88.7 92.3 91.0 91.7 93.6 89.0 93.5Probe sets with Non-neutralexpression (rates in %)

#1 #2 #3 #4 #5 #6 #7 #8 N/A

43.7 59.5 56.0 64.1 67.3 66.4 65.9 69.1 N/A

Recognition rates when the probes have expression change are

30% on average lower than when there is no expression change.

1.3. Facial Expression Analysis

Changes due to facial expressions influence accuracy not

only for 3D shape but also 2D appearance. Even though it is

extremely hard to generalize about the expressions of a per-

son, we have visual evidence of different degrees of muscle

movement. For instance, regions around the mouth wouldnot seem to be reliable for matching since open mouth due

to a smile deforms the shape significantly (Fig.2-(A)). This

suggests to collect the sample points located on relatively

static regions that are considered for facial surface match-

ing. Other expressions, such as surprise, sad and disgusted

(Fig.2-(B),(C),(D)) contract or expand regions including the

mouth, forehead and cheek. Specifically, a surprised ex-

pression generates contracted forehead muscles producing

2



3/8

(A) happy (B) surprised (C) sad (D) disgusted

Figure 2: Different non-neutral expressions in 2D and 3D.

wrinkles, lifted eye brow and cheek, and possibly an open

mouth. In the case of a sad expression, the muscles be-

tween the eye brows is contracted, lips are generally de-

formed, and cheeks are possibly lifted. Therefore, the ideal

regions for reliable matching based on our qualitative eval-

uation would be an area around the nose which displays less

movement under expression than other areas. Because only

the nose area is considered for matching, the other points

are eliminated, the number of points for matching is greatly

reduced (105 to103 in average).

2. Adaptive Rigid Multi-region Selec-

tion Method Description

A new 3D face recognition algorithm, called Adaptive

RigidMulti-regionSelection (ARMS), is proposed to cope

with expression variation between gallery and probe im-ages. It first finds a relatively rigid region in the high cur-

vature area on a face. There are two separate ARMS-based

methods depending on how ROIs are extracted. The first

method uses manually labeled ground truth points (two

outer eye tips and a nose tip) to extract ROI regions for

matching. The second method finds ROIs automatically us-

ing our facial feature finding methods (described in Sec-

tion 2.2 and Section 2.3). In addition to the ARMS-based

method, an PCA-based method is included to compare the

recognition accuracy using the manually labeled points with

ARMS-based methods.

The following subsections describe how our automated

feature finding method extracts ROIs and how these sur-

faces are matched to recognize a person.

2.1. Overall Framework

The problem of varying facial expression can be mini-

mized by considering sample points chosen in facial regions

where relatively static movements are displayed under ex-

pressions, such as the nose region. The following steps are

involved to accomplish the task of person identification un-

der expression changes. First, a group of skin regions is

Figure 3: The overview of the proposed method

located by a skin detection method using the correspond-

ing 2D color image. Pixels in the color image are trans-

formed into the YCbCr color-space [21]. Pixels are used inthe skin detection method only if they have a valid 3D point.

A group of 3D points in a skin region specified by a rectan-

gular area will be processed to compute for 3D geometrical

features (See Fig.4). This step removes not only irrelevant

regions for matching, such as shoulder or hair area, but also

reduces computing time for later steps.

Valid 3D points found in regions detected by the skin

detection are subject to 3D geometrical feature computa-

tion to classify an observed facial surface. Gaussian cur-

vature (K) and mean curvature (H) are computed and ge-

ometrical shape can be identified by surface classification

(See Fig.5). Once 3D surface classification is complete, the

following regions are detected: nose tip (peak region), eyecavities (pit region) and nose bridge (saddle region). Con-

sidering several different surfaces would provide a chance

to select the best match among them. For instance, under

expression changes, one region might result in better accu-

racy than other regions (See Fig.6).

The last step involves surface registration to measure the

similarity of shape between a gallery and a probe surface.

Probe surfaces are matched against a gallery surface and

the identification process would rank each individual sur-

face based on the root mean square error (RMS) error re-

ported by ICP. This reflects the amount of difference in 3D

face shape after alignment by ICP. During the decision pro-

cess, voting or fusion rules can be considered to determine

identity (See Fig.7).

2.2. Skin Region Detection and Preprocessing

At first, a raw 3D scan is subsampled by 4 in both X and

Y direction. Then, a group of skin pixels is extracted by

using 3D data points and our skin model constructed in

YCbCr color-space for 2D color images as shown in Fig.4.

3



4/8

At the end of this task, a 3D scan contains skin region in-

cluding face and/or neck area which will be used for fur-

ther processes. Once skin regions (predominantly face) in

a model are found, outliers including spikes and noise are

suppressed. When an angle between the optical axis and a

surface normal of observed points is greater than a certain

degree, they are claimed as outliers and removed from themodel. A Gaussian filter is used for smoothing the data after

outlier removal. Finally, a pose was corrected. Pose correc-

tion is done by aligning an input surface to a generic 3D

face model using the ICP method. A transformation matrix

aroundX, Y and Zas well as translation (from a given data

to a known model) is produced by the ICP. The input data

points are then transformed based on this matrix.

Figure 4: Face region extraction

2.3. Curvature-based Face Features

This section describes a methodology that chooses some

face regions that could always be found as the same cur-vature region type (peak, pit, saddle and so on) regardless

of face expression. Also, related steps for 3D surface cur-

vature computation are explained.

Relative to [8], our approach is distinguished by using

only the 3D shape information in order to perform feature

finding, rather than needing to process the 2D image to find

features that are used to initialize the 3D matching. Given a

surface, geometric features such as ridges, valleys or peaks

can be estimated to characterize the surface. By computing

Gaussian curvature (K) and mean curvature (H), geometri-

cal shape may be identified by surface classification. During

geometrical feature computation, the regions of interest are

nose tip (peak), eye cavities (pit) and nose bridge (saddle).Once the locations of ROIs are identified, sample points are

extracted around the nose area.

Coordinate Transformation: This is a preprocessing step

for the curvature estimation. A local coordinate system de-

fined by the principal directions of the point set (Nnearest

neighborhood points at every point). The reason why the

points are being transformed is to fix a reference point with

its neighborhood and coordinate axis such that every point

in the new space can be represented as an n-tuple of its co-

ordinates.

The least variation can be detected from finding the

smallest eigenvalue. The eigenvector (vmin) of the small-

est eigenvalue is then set to be new local Z-axis. This ap-

proach indicates that the least variation should be observed

along the surface normal. While the new (local)Z-axis isobtained, the orientation of the axis needs to be verified.

The orientation is checked against a direction of the aver-

aged surface normal (n) at the reference point. The anglebetween these two vectors (n, vmin) is examined for the va-

lidity of the orientation ofvmin. Once the verification is

completed, the Npoints are transformed into a new local

affine coordinate as [ui, vi, zi]T = VT(xi xp), where

V = [vmax, v, vmin], xi is one of the points around thecurrent point xp [22].

Least Square Fit: This step is to obtain a set of coeffi-

cients of the quadratic equation for the curvature compu-

tation. The neighborhood points (Np) being observed at

the reference point (P) are now transformed into the newcoordinate system in eigenvectors (vmax, v, vmin) of the

co-variance matrix as [ui, vi, zi]T. The quadratic equation

requires 6 or more equations to find the six unknown coef-

ficients:

z= f(u, v) = a1u2 + a2uv+ a3v

2 + a4u + a5v+ a6

Given the coefficients, the partial derivatives can be com-

puted to obtain theK and H.

Curvature Estimation: A Monge patch technique is

used to compute the coefficients of the first and the

second fundamental forms [23]. A Monge patch can

be written as a surface of the explicit equation formM(u, v) = (u,v,f (u, v)). The Gaussian (K) and meancurvature (H) are then obtained. Depending on the sign of

Kand H, an observed local surface patch can be classified

into one of the eight different shapes [24]. However, when

an input surface is deformed, it increases intra-variation

rate and may not be possible for a representation of the

surface to be recognized uniquely. Three face surfaces

of the same subject with different expression (deformed

surface) color-coded for surface classes are shown in Fig.5.

Local Surface Realization: As threshold values are

tested during the sign test to determine the surface class

types, a nose tip is expected to be a peak (K>TK andHTK and

H


5/8

[Neutral] [Happy] [Disgusted]

Figure 5: Images of a single person with different expressions renderedbased on surface types. The regions in many surface types changed as

the deformation is introduced. As cheeks are lifted shown inhappy and

disgusted, we can see clearly that peaks are detected at the upper cheeks in

both sides or in lips.

a set of predefined implicit functions (See Fig.6).

Even though we claimed that the ideal regions for face

matching under varying expressions are areas around the

nose, parts of the nose still show certain degrees of mus-

cle movement, (nose bridge/nostril). This problem can be

resolved by considering multiple local surfaces around the

general nose area. These are the primary regions of inter-

est during facial feature findings. This method of finding

the curvature-based face regions is automated and has been

evaluated on 4,485 3D face images of 449 people with a va-

riety of facial expressions. The facial landmarks (eye cav-

ities, nose tip and nose bridge) were successfully found in

99.4% of the images (4,458 of 4,485).

Figure 6: Local surface realization for a gallery and three probe. Inthe relation to the locations of ROIs identified, simple geometry (implicit)

functions are used to extract the matching regions for gallery and probe.

2.4. Face Matching in Identification

Given a pair of surfaces to be matched, the initial regis-

tration is performed by translating the centroid of the probe

surface to the centroid of the gallery surface. Iterative align-ment based on point difference between two surfaces is per-

formed. At the end of each iteration, the RMS difference is

computed between two surfaces. The iteration halts when

there is little or no change. Because a probe has 3 local sur-

faces that need to be matched to a gallery, decision fusion is

required to combine the three RMS error values for the final

similarity value (See Fig.7).

During the decision process of matching each probe to

one of the gallery entries, some fusion or voting rule must

be used. We considered the sum rule, minimum rule, and

product rule. Thesum rule takes the sum of the RMS dif-

ferences for the three regions from the probe image as the

probe-to-gallery match value. The minimumrule takes the

smallest of the RMS difference values. The productrule

takes the product of the three difference values.

Figure 7: As three local surfaces are matched against a gallery, differentfusion strategies may be considered to combine results either at the metric

based or at the rank based

3. Data Collection

A total of 546 different subjects participated in one or more

data acquisition sessions yielding a total of 4,485 3D scans

used in this study. Among the 546 subjects, 449 partici-

pated in both a gallery acquisition and at least one or more

probe acquisition(s). Subjects who only have non-neutral

expressions are dropped since a gallery image with a neu-

tral expressions is required.

There are two classes of probes depending on the expres-

sion changes beingaskedof the subjects at the time of dataacquisition. The first class consists of 9 probe sets and each

probe set contains 3D scans acquired under neutral expres-

sion collected in different weeks. This class has a gallery

set of the 449 subjects and a total of 2,798 probe images of

those 449 subjects.

The second class consists of 8 probe sets and each probe

set contains 3D scans acquired while subjects wereaskedto

have different ones of the human expressions described by

Ekman [25]. The second class has the same gallery as the

first class and a total of 1,590 probes acquired in later weeks

of 355 subjects, a subset of the 449 subjects in the gallery.

The training set, needed only for the PCA method, con-

tains the 449 gallery images plus an additional 97 imagesfor subjects whom good data was not acquired in both the

gallery and probe sessions. Thus, this additional 97 images

are used only to create the facespace for the PCA method.

4. Experiment

The methods considered in the experiments are (1) ICP-

baseline, using manually selected landmark points and the

5



6/8

Figure 8: Example images in 2D and 3D with different expressions

Table 2: Statistics Dataset Used in this StudyNeutralexpression sets : 2,798 scans of 449 subjects

* 1 2 3 4 5 6 7 8 9

449 449 390 336 286 243 205 172 145 123

Non-neutralexpression sets : 1,590 scans of 355 subjects

* 1 2 3 4 5 6 7 8

355 355 321 266 209 168 125 91 55

Subjects with 2 or more scans = 449, subjects with only one scan = 97

Total number of scans : 4,485 scans. * is the gallery set

whole face (See Section1.2), (2) PCA-baselinewhich uses

manually selected points and the whole face is matched as

shown in (See Fig.1), (3) ARMS-autowhich automatically

finds ROIs and extracts the multiple nose area regions for

matching and (4)ARMS-manual, using manually selected

landmark points to extract the multiple nose regions for

matching.

Four experiments are conducted to evaluate the recogni-

tion rates in various situations, such as time-lapse and ex-

pression variations between a gallery and a probe. The first

experiment investigates how the recognition performance is

affected by time-variation only, with no expression change.

The second experiment evaluates the performance of PCA-

baseline and two ARMS-based methods when both time

and expression are varied. In the third experiment, the per-

formance effects of 3D face recognition methods on the

number of probes are examined. Finally, the probes are col-

lected into one single pool in the forth experiment. There

will be one or more probes for a subject who appears in the

gallery, with each probe being acquired in a different acqui-

sition session separated by a week or more.

4.1. Time Variation Effects on Performance

This experiment evaluates the performance across 9 probe

sets. Probe set #2 has a greater elapsed time betweengallery and probe image acquisition than probe set #1, and

so on. The results are shown in Fig. 9. PCA-baseline

has an average 77.7% rank-one recognition rate. Both

ARMS-based methods combining probe #1 and probe #3

(as shown in Fig. 6) with the product rule performed

higher than our baseline methods yielding rank-one recog-

nition rate of 96.6% by ARMS-auto and 96.1% by ARMS-

manual. These results show that (1) both ARMS-based

Probe#1 Probe#2 Probe#3 Probe#4 Probe#5 Probe#6 Probe#7 Probe#8 Probe#9

Probes with Neutral Expression

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rank-OneRate

ICP-baselineARMS-autoARMS-manual

Figure 9: Rank-one recognition with same neutral expression as gallery.A number in the parentheses is the number of probe images. The sumrule

obtained 96.6% by ARMS-auto, and theminimumrule obtained 96.5%.

methods outperform the PCA-baseline in neutral expres-

sion, (2) there is marginal performance difference between

automated and manual ARMS methods, reflecting that our

automated 3D facial feature finding method is 99.4% suc-

cessful, (3) the rank-one recognition rates were maintainedsurprisingly well as the elapsed time between gallery and

probe increases, and (4) the two algorithms ARMS-manual

and ARMS-auto differ only in ARMS-manual using manu-

ally selected landmark points to initialize the ICP matching

and ARMS-auto being totally automated.

4.2. Effects of Expression Variation

This experiment examines our ARMS-based methods and

PCA-baseline when subjects have different expressions in

their gallery and probe images. This has the same exper-

imental design as the previous one except that there are 8

probe sets. As shown in Fig. 10, our proposed ARMS-based

Probe#1 Probe#2 Probe#3 Probe#4 Probe#5 Probe#6 Probe#7 Probe#8

Probes with Non-Neutral Expression

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rank-OneRate

ICP-baselineARMS-autoARMS-manual

Figure 10: Rank-one results with different expressions. The sum ruleobtained 86.8% by ARMS-auto, and theminimumrule obtained 82.9%.

methods (auto and manual) clearly outperform the PCA-

baseline method. The average rank-one match rate is 61.3%

for PCA-baseline, while ARMS-auto achieves 87.1% with

the product rule on average (88.6% by ARMS-manual). We

found that (1) our ARMS-based methods also have better

identification accuracy than ICP-baseline and PCA-baseline

6



7/8

Probe #1(449) Probe #1(225) Probe #1(113) Probe #1(57) Probe #1(29)

5 Different Sizes of the "Probe #1" Set with Neutral Expression

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rank-OneRate

ARMS-autoARMS-manual

Probe #1(355) Probe #1(178) Probe #1(89) Probe #1(45) Probe #1(23)

5 Different Sizes of the "Probe #1" Set with Non-Neutral Expression

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rank-OneRate

ARMS-autoARMS-manual

Figure 11: Rank-one identification rates obtained to probes in differentsizes. Probes with neutral expression (top), Probes with non-neutral ex-

pression (bottom). The error bars in each probe (except the first probe set)

show the standard error of rank-one match rates in each probe set.

in varying expressions, (2) expression changes do cause

performance to deteriorate in all methods, (3) the rank-one

rates of shape are more consistent than 2D rates in the func-

tion of time changes reported in other study [7]. This is

observed in both neutral and non-neutral expressions.

4.3. Scalability of 3D Face Recognition

The results in this experiment provide the performance ef-

fects on thenumber of subjects enrolled in a probe set. We

begin with probe set #1 for the neutral expression class and

the varying expression class. We randomly select one half

of probe set #1 to generate a reduced-size probe set, and

do this multiple times. For instance, probe set #1 in the

neutral-expression class has 449 subjects. The second probe

set in this experiment has 225 of the 449 subjects. In or-

der to show performance variance caused by subject pool,

we generated 10 sets from randomly selected 225 subjects,and computed the error rates of each probe. Then, the third

probe set consists of 10 sets of randomly selected 113 of

the 449 subjects and so on. Each class has five probe sets,

and rank-one rates are plotted in Fig.11. Our results indi-

cate that there is a tendency for higher performance rates as

the probe size decreases and they coincide with evaluation

results reported in the FRVT [7]. Also, this performance

behavior is more prominent when expressions are varied.

4.4. Multiple Probe Study

This experiment models the situation where a person is en-

rolled in the system with one neutral expression image and

then attempts are made to recognize the person at various

later points in time based on the image acquired at that time,

with possibly varying facial expression. For the multiple

probe study, the gallery images are acquired in the first weekand all the probes acquired in later weeks are collected into

a single pool, yielding 3,939 (2,349+1,590)probes (See Ta-ble 2 for dataset statistics). Then, correct or incorrect match

is recorded for each probe. Performance rate is reported as

an average of correct match rate by all subjects. (See Ta-

ble 3). ARMS-auto and ARMS-manual methods achieved

the similar performance rates and show higher performance

accuracy than methods that use whole face in matching.

Table 3: Multiple Probe Study

PCA-baseline ICP-baseline ARMS-auto ARMS-manual

70.7% 78.1% 91.9% 92.3%

5. Conclusions and Discussion

Our baseline experiments involve the evaluation of PCA

or ICP based algorithms similar to ones recently reported

in the literature [11, 1, 3, 9]. Using a PCA-based algo-

rithm and an ICP-baseline method, each with manually-

selected landmark points, gives results that show better per-

formance for the ICP-based approach. These algorithms use

the whole frontal face area in matching.

Our results further show that in the presence of expres-sion change, the recognition performance of these baseline

algorithms drops dramatically. This is because they treat the

whole frontal face region as a rigid surface, and of course

the face as a whole is not rigid over expression change.

The proposed algorithm focuses on the general nose re-

gion area of the face, as being the most rigid across expres-

sion change. We actually use three different shape nose re-

gions, to allow for the fact that the whole nose area region

is not perfectly rigid across expression change. Later, all

the three regions are matched independently from probe to

gallery, and the three match values combined to recognize

the probe identity. Among the fusion rules considered, the

product rule shows slightly but consistently higher accuracythan the sum rule or the minimum rule.

We evaluate the new algorithm once using the manually

selected points to initialize the algorithm, and a second time

as a fully-automated algorithm not using any manually se-

lected points. The fully automated version has only slightly

lower performance than the version using manually selected

points. This reflects the overall 97.5% accuracy in auto-

mated point selection. Also, the new algorithm matching

7



8/8

in general nose regions shows better performance rate than

PCA-baseline. This shows our method has higher discrimi-

natory power. When expression is varied in subjects at dif-

ferent times, the recognition rate does not degrade as much

as PCA-baseline or ICP-baseline using whole face region.

One surprising element of this work is that we can

achieve such good performance using a small portion ofwhole face by the ARMS-based method. However, the fore-

head regionis sometimes obscured by hair (See Fig. 12) and

also has furrows in some expressions. And our 3D images

are frontal views, making the cheekbone regions (located in

the edge-on silhouette) difficult to use. Since the mouth and

lips are varied greatly with expressions, the nose area be-

comes the logical remaining choice for matching.

Gallery Correctly Incorrectly

matched matched

probe probe

Figure 12: One of the cases when localized face regions for matchingwould be more advantageous than when whole face is used for matching is

face occlusion. This subject was incorrectly identified with the PCA-based

method but was successfully matched by our new ARMS-based method.

Problems like facial occlusion due to hair or mustache can be resolved by

a local region based matching. (Intensity image is shown for illustration

purposes only)

Even though the time it takes to complete the ICP surface

matching ranges between 0.4 to 0.7 seconds in about 30 it-

erations, preprocessing steps (data cleanup, skin detection,facial feature extraction and ROI extraction) take approx-

imately 2 to 3 seconds to complete. Making 3D process-

ing computationally efficient might be a challenging task.

One way to improve face matching is to apply a spatial

search technique using a specialized data structure for the

ICP method [26].

The dataset used in the experiments reported here will

be made available to other research groups as a part of the

HID databases. See http://www.nd.edu/ cvrl for additional

information.

References[1] C. Hesher, A. Srivastava, and G. Erlebacher, A novel technique for face recog-

nition using range imaging, Seventh International Symposium on Signal Pro-

cessing and its Applications, 2003.

[2] Gaile Gordon, Face recognition based on depth and curvature features, Com-

puter Vision and Pattern Recognition (CVPR) , pp. 108110, June 1992.

[3] G. Medioni and R. Waupotitsch, Face modeling and recognition in 3-D,IEEE

International Workshop on Analysis and Modeling of Faces and Gestures, pp.232233, October 2003.

[4] G. Givens, R. Beveridge, B. Draper, and D. Bolme, A statistical assessment ofsubject factors in the pca recognition of human faces, Workshop on Statistical

Analysis in Computer Vision (in CVPR), 2003.

[5] K. Chang, K. Bowyer, and P. Flynn, An evaluation of multi-modal 2D+3D

face biometrics, IEEE Transactions on Pattern Analysis and Machine Intelli-

gence, 2004.

[6] K. Bowyer, K. Chang, and P. Flynn, A short survey of 3D and multi-modal

3D+2D face recognition, International Conference on Pattern Recognition,

UK, 2004.

[7] J. Phillips, P. Grother, R. Micheals, D. Blackburn, E. Tabassi, and M. Bone,Facial recognition vendor test 2002: Evaluation report, available at

http://www.frvt2002.org/FRVT2002/documents.htm.

[8] D.Colbry, X. Lu, A. Jain, and G. Stockman, 3D face feature extraction for

recognition,Michigan State Univ. Tech. Report, 2004.

[9] X. Lu, D.Colbry, and A. Jain, Three-dimensional model based face recogni-

tion, International Conference on Pattern Recognition, pp. 362366, 2004.

[10] T. Papatheodorou and D. Reuckert, Evaluation of automatic 4D face recogni-

tion using surface and texture registration, Sixth International Conference on

Automated Face and Gesture Recognition, pp. 321326, 2004.

[11] K. Chang, K. Bowyer, and P. Flynn, Face recognition using 2D and 3D facial

data, ACM Workshop on Multimodal User Authentication , pp. 2532, Decem-ber 2003.

[12] G. Pan, Z. Wu, and Y. Pan, Automated 3D face verification from range data,International Conference on Acoustics, Speech and Signal Processing, pp. 192

196, 2003.

[13] A. Bronstein, M. Bronstein, and R. Kimmel, Expression-invariant 3D face

recognition, 4th International Conference Audio and Video based Biometric

Person Authetication, pp. 6270, 2003.

[14] F. Tsalakanidou, D. Tzovaras, and M.Strintzis, Useof depth and coloureigen-

faces for face recognition, Pattern Recognition Letters, pp. 14271435, 2003.

[15] T. Russ, K. Koch, and C. Little, 3D face recognition: a quantitative analysis,45-th Annual Meeting of the Institute of Nuclear Materials Management, 2004.

[16] A. Moreno, A. Sanchez, J. Velez, and J. Diaz, Face recognition using 3d

surface-extracted descriptors, Irish Machine Vision and Image ProcessingConference, September.

[17] Y. Lee, K. Park, J. Shim, and T. Yi, 3D face recognition using multiple fea-

tures for the local depth information, 16th International Conference on VisionInterface, Halifax, Canada, June 2003.

[18] G. Gordon, Face recognition based on depth maps and surface curvature,SPIE Geometric Methods in Computer Vision, San Diego CA., vol. 1570, 1991.

[19] T. Nagamine, T. Uemura, and I. Masuda, 3d facial image analysis for human

identification, pp. 324327, 1992.

[20] C. Chua, F. Han, and Y. Ho, 3D human face recognition using point signature,Intl Conf. on Automatic Face and Gesture Recognition, pp. 233238, 2000.

[21] C. Poynton, A Technical Introduction to Digital Video, John Wiley & Sons,

New York, 1996.

[22] P.J. Flynn K.L. Boyer, R. Srikantiah, Saliency sequential surface organization

for free-form object recognition, Computer Vision Image Understanding, pp.152188, 2002.

[23] P.J. Flynn and A.K. Jain, 3dobject recognition usinginvariant feature indexing

of interpretation tables, Computer Vision, Graphics Image Processing, , no.55, pp. 119129, 1992.

[24] P.J. Besl and N.D. McKay, A method for registration of 3-D shapes, IEEE

Transactions on Pattern Analysis and Machine Intelligence , vol. 14, no. 2, pp.239256, Feb. 1992.

[25] P. Ekman, Basic emotions,Handbook of Cognition and Emotion, pp. 4560,

1999.

[26] M. Greenspan and G. Godin, A neareast neighbor method for efficient ICP,Third International Conferenceon 3-D Digital Imaging and Modeling, pp.161168, 2001.

8


2005 - adaptive rigid multi-region selection for handling expression variation in 3d face...

Documents