2005 - adaptive rigid multi-region selection for handling expression variation in 3d face...

Upload: luongxuandan

Post on 13-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 2005 - Adaptive Rigid Multi-Region Selection for Handling Expression Variation in 3D Face Recognition

    1/8

    Adaptive Rigid Multi-region Selection for Handling Expression Variation in 3D

    Face Recognition

    Kyong I. Chang Kevin W. Bowyer Patrick J. FlynnComputer Science & Engineering Department

    University of Notre Dame Notre Dame, IN 46556

    Abstract

    We present a new algorithm for 3D face recognition, and

    compare its performance to that of previous approaches.

    We focus especially on the case of facial expression change

    between gallery and probe images. We first establish per-

    formance comparisons using a PCA (eigenface) algo-

    rithm and an ICP (iterative closest point) algorithm simi-lar to ones reported in the literature. Experimental results

    show that the performance of either approach degrades sub-

    stantially in the case Then we introduce a new algorithm,

    AdaptiveRigidMulti-regionSelection, is introduced to in-

    dependently matches multiple facial regions and creates a

    fused result. This algorithm is fully automated and used no

    manually selected landmark points. Experimental results

    show that our new algorithm substantially improves perfor-

    mance in the case of varying facial expression. Our exper-

    imental results are based on the largest 3D face dataset to

    date, with 449 persons, over 4,000 3D images, and substan-

    tial lapse between gallery and probe images.

    1. Introduction

    Recently, interest in 3D face recognition has grown and a

    great deal of research effort has been devoted to biometric

    sources represented in 3D (e.g. face, hand geometry, ear).

    There is a commonly accepted claim that face recognition

    in 3D is superior to 2D due to the invariance of 3D sensors

    to illumination, facial make-up and pose [1, 2, 3]. This is

    mainly because 3D sensors acquire data based on the shape

    of objects in the scene instead of light reflected from the

    scene. The additional dimension makes a 2.5D shape avail-

    able, however it also requires methods to process such data

    in a reasonable and efficient way. This might be trivial indomains dealing with artificial objects in laboratory light-

    ing, but it remains to be demonstrated that it can be robust

    and accurate in typical person identification environments.

    Other benefits of shape information in the context of face

    recognition include that the shape of the human face does

    not change as much as the appearance over time.

    A recent study by Givens et al.[4] reported that the hard-

    est factors among others in face recognition are expres-

    sion changes, eye lid open / closed and mouth open /

    closed. All of these are related to facial expression to a

    certain extent. The results also coincide with our previous

    work [5]. In that study, we found that different expressions

    between the gallery and probe sets degrade rank-one recog-

    nition rates in 2D face by as much as 15%. Also, a similar

    study performed for 3D face recognition shows that perfor-

    mance degrades by as much as 33% [6].

    One of the conclusions reported by the Face Recognition

    Vendor Test 2002 [7] is that the number of subjects in the

    database and time-lapse between gallery and probe affects

    the overall performance rates: For identification and watch

    list tasks, performance decreases linearly in the logarithm

    of the database size [7]. Average performance decrease

    for 2D face recognition is 15% in identification when the

    time lapse between gallery and probe reaches around 500

    days. Note thattime lapseimplies more than the temporal

    aspect. In other words, pose, facial make up, facial hair,

    and/or lighting condition are factors associated with time

    lapse in the evaluation. These arguments raise a problem

    with currently reported studies in 3D, since most of the 3D

    approaches reviewed in [6] considered only neutral expres-

    sions with a limited number of subjects and time-variations.

    This study pursues the idea of there being some subset of

    the face that is relatively rigid between two expressions, and

    using multiple regions to allow flexibility across different

    expressions. There are at least three general methods that

    one might employ in an attempt to handle the problem of

    varying facial expression. One approach would be to sim-

    ply concentrate on regions of the face whose shape changes

    the least with varying facial expression. For example, one

    might simply ignore the lips and mouth region, since the

    shape varies greatly with expression. Of course, there isno large subset of the face that is perfectly shape invariant

    across a broad range of normal expressions, and so this ap-

    proach will not be perfect. Another approach would be to

    enroll a person into the gallery by intentionally sampling

    a set of different facial expressions, and to match a probe

    against the set of shapes representing a person. This ap-

    proach requires some cooperation on the part of the subject

    in order to obtain the set of different facial expressions. This

    1

    oceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05)063-6919/05 $20.00 2005 IEEE

  • 7/27/2019 2005 - Adaptive Rigid Multi-Region Selection for Handling Expression Variation in 3D Face Recognition

    2/8

    approach also runs into the problem that, however large the

    set of facial expressions sampled for enrollment, the probe

    shape may represent an expression other than those sam-

    pled. Thus this approach also does not seem to allow the

    possibility of a perfect solution. A third approach would be

    to have a general model of 3D facial expression that can be

    applied to any persons image(s). The search for a matchbetween a gallery and a probe shape could then be done

    over the set of parameters controlling the instantiation of

    expression-specific shape. This approach seems destined to

    also run into problems. There likely is no general model

    to predict, for example, how each persons neutral expres-

    sion image is transformed into their smiling image. A smile

    means different things to different persons facial shapes,

    and different things to the same person at different times

    and in different cultural contexts. Given that there does not

    seem to be any single correct approach, the question is

    which approach or combination of approaches can be used

    to achieve the desired level of performance.

    In this study, we first document the extent to which facialexpression change degrades performance in 3D face recog-

    nition. Then, we address this problem by considering 3D

    face matching only in localized facial regions that show rel-

    atively less variation across expressions. Such regions are

    detected automatically using 3D geometrical features, un-

    like other facial finding methods using both 2D color and

    3D depth images acquired at the same time [8].

    1.1. Previous Work

    There appear to be three main categories of approach to 3D

    face recognition in the literature. A 3D face can be thought

    of as a group of points defined in 3D space and they can bematched using a registration technique [3, 9, 10]. Also, the

    eigenface approach can be extended to accomplish recogni-

    tion by measuring the depth (shape) variations observed in

    range images [11, 1, 12, 13, 14]. Finally, there is a group of

    studies that use a set of features computed from 3D geome-

    try of face to measure the similarity [15, 16, 17, 18, 19].

    Even though a handful of 3D face recognition studies

    [20, 13, 16] consider expression variations, there is no rig-

    orous evaluation study that explicitly addresses the facial

    expression problems on a large dataset.

    1.2. Initial Study for the Baseline PerformanceAn experiment is conducted to establish a baseline perfor-

    mance obtained by using a whole face. In one approach,

    a whole face is cropped using manually selected points on

    two outer eye tips and a nose tip (See Fig.1). Similarity be-

    tween gallery and probe surfaces is measured using the ICP

    surface registration technique.

    This approach is compared to the PCA-based approach

    to show the difference in recognition accuracy between the

    two different approaches.

    ICP-baseline PCA-baseline

    Figure 1: Sample images (a gallery on the left and a probe on the rightin each column) used for the baseline performance. Nearly the entire face

    region is considered for both methods. For the ICP-baseline, the probe face

    is approximately 10% smaller coverage area than in the gallery set.

    The results shown in Table 1 are obtained with an

    ICP algorithm (denoted as ICP-baseline) matching the

    whole frontal face region, using manually selected land-

    mark points for the initial rotation and translation estimate

    given to the ICP algorithm. Successive probe sets havelonger elapsed time between acquisition of the gallery im-

    age and the probe image. The same gallery images are used

    with all probe sets, and all gallery images have neutral ex-

    pression. There is significant performance drop when ex-

    pression varies between gallery and probe, from an average

    of 91.0% down to an average of 61.5%. This clearly shows

    the limitation in rigid registration of deformed surfaces.

    Table 1: Performance Degradation with Expression Change

    Probe sets with Neutral expression (rates in %)

    #1 #2 #3 #4 #5 #6 #7 #8 #9

    92.2 87.7 88.7 92.3 91.0 91.7 93.6 89.0 93.5Probe sets with Non-neutralexpression (rates in %)

    #1 #2 #3 #4 #5 #6 #7 #8 N/A

    43.7 59.5 56.0 64.1 67.3 66.4 65.9 69.1 N/A

    Recognition rates when the probes have expression change are

    30% on average lower than when there is no expression change.

    1.3. Facial Expression Analysis

    Changes due to facial expressions influence accuracy not

    only for 3D shape but also 2D appearance. Even though it is

    extremely hard to generalize about the expressions of a per-

    son, we have visual evidence of different degrees of muscle

    movement. For instance, regions around the mouth wouldnot seem to be reliable for matching since open mouth due

    to a smile deforms the shape significantly (Fig.2-(A)). This

    suggests to collect the sample points located on relatively

    static regions that are considered for facial surface match-

    ing. Other expressions, such as surprise, sad and disgusted

    (Fig.2-(B),(C),(D)) contract or expand regions including the

    mouth, forehead and cheek. Specifically, a surprised ex-

    pression generates contracted forehead muscles producing

    2

    oceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05)063-6919/05 $20.00 2005 IEEE

  • 7/27/2019 2005 - Adaptive Rigid Multi-Region Selection for Handling Expression Variation in 3D Face Recognition

    3/8

    (A) happy (B) surprised (C) sad (D) disgusted

    Figure 2: Different non-neutral expressions in 2D and 3D.

    wrinkles, lifted eye brow and cheek, and possibly an open

    mouth. In the case of a sad expression, the muscles be-

    tween the eye brows is contracted, lips are generally de-

    formed, and cheeks are possibly lifted. Therefore, the ideal

    regions for reliable matching based on our qualitative eval-

    uation would be an area around the nose which displays less

    movement under expression than other areas. Because only

    the nose area is considered for matching, the other points

    are eliminated, the number of points for matching is greatly

    reduced (105 to103 in average).

    2. Adaptive Rigid Multi-region Selec-

    tion Method Description

    A new 3D face recognition algorithm, called Adaptive

    RigidMulti-regionSelection (ARMS), is proposed to cope

    with expression variation between gallery and probe im-ages. It first finds a relatively rigid region in the high cur-

    vature area on a face. There are two separate ARMS-based

    methods depending on how ROIs are extracted. The first

    method uses manually labeled ground truth points (two

    outer eye tips and a nose tip) to extract ROI regions for

    matching. The second method finds ROIs automatically us-

    ing our facial feature finding methods (described in Sec-

    tion 2.2 and Section 2.3). In addition to the ARMS-based

    method, an PCA-based method is included to compare the

    recognition accuracy using the manually labeled points with

    ARMS-based methods.

    The following subsections describe how our automated

    feature finding method extracts ROIs and how these sur-

    faces are matched to recognize a person.

    2.1. Overall Framework

    The problem of varying facial expression can be mini-

    mized by considering sample points chosen in facial regions

    where relatively static movements are displayed under ex-

    pressions, such as the nose region. The following steps are

    involved to accomplish the task of person identification un-

    der expression changes. First, a group of skin regions is

    Figure 3: The overview of the proposed method

    located by a skin detection method using the correspond-

    ing 2D color image. Pixels in the color image are trans-

    formed into the YCbCr color-space [21]. Pixels are used inthe skin detection method only if they have a valid 3D point.

    A group of 3D points in a skin region specified by a rectan-

    gular area will be processed to compute for 3D geometrical

    features (See Fig.4). This step removes not only irrelevant

    regions for matching, such as shoulder or hair area, but also

    reduces computing time for later steps.

    Valid 3D points found in regions detected by the skin

    detection are subject to 3D geometrical feature computa-

    tion to classify an observed facial surface. Gaussian cur-

    vature (K) and mean curvature (H) are computed and ge-

    ometrical shape can be identified by surface classification

    (See Fig.5). Once 3D surface classification is complete, the

    following regions are detected: nose tip (peak region), eyecavities (pit region) and nose bridge (saddle region). Con-

    sidering several different surfaces would provide a chance

    to select the best match among them. For instance, under

    expression changes, one region might result in better accu-

    racy than other regions (See Fig.6).

    The last step involves surface registration to measure the

    similarity of shape between a gallery and a probe surface.

    Probe surfaces are matched against a gallery surface and

    the identification process would rank each individual sur-

    face based on the root mean square error (RMS) error re-

    ported by ICP. This reflects the amount of difference in 3D

    face shape after alignment by ICP. During the decision pro-

    cess, voting or fusion rules can be considered to determine

    identity (See Fig.7).

    2.2. Skin Region Detection and Preprocessing

    At first, a raw 3D scan is subsampled by 4 in both X and

    Y direction. Then, a group of skin pixels is extracted by

    using 3D data points and our skin model constructed in

    YCbCr color-space for 2D color images as shown in Fig.4.

    3

    oceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05)063-6919/05 $20.00 2005 IEEE

  • 7/27/2019 2005 - Adaptive Rigid Multi-Region Selection for Handling Expression Variation in 3D Face Recognition

    4/8

    At the end of this task, a 3D scan contains skin region in-

    cluding face and/or neck area which will be used for fur-

    ther processes. Once skin regions (predominantly face) in

    a model are found, outliers including spikes and noise are

    suppressed. When an angle between the optical axis and a

    surface normal of observed points is greater than a certain

    degree, they are claimed as outliers and removed from themodel. A Gaussian filter is used for smoothing the data after

    outlier removal. Finally, a pose was corrected. Pose correc-

    tion is done by aligning an input surface to a generic 3D

    face model using the ICP method. A transformation matrix

    aroundX, Y and Zas well as translation (from a given data

    to a known model) is produced by the ICP. The input data

    points are then transformed based on this matrix.

    Figure 4: Face region extraction

    2.3. Curvature-based Face Features

    This section describes a methodology that chooses some

    face regions that could always be found as the same cur-vature region type (peak, pit, saddle and so on) regardless

    of face expression. Also, related steps for 3D surface cur-

    vature computation are explained.

    Relative to [8], our approach is distinguished by using

    only the 3D shape information in order to perform feature

    finding, rather than needing to process the 2D image to find

    features that are used to initialize the 3D matching. Given a

    surface, geometric features such as ridges, valleys or peaks

    can be estimated to characterize the surface. By computing

    Gaussian curvature (K) and mean curvature (H), geometri-

    cal shape may be identified by surface classification. During

    geometrical feature computation, the regions of interest are

    nose tip (peak), eye cavities (pit) and nose bridge (saddle).Once the locations of ROIs are identified, sample points are

    extracted around the nose area.

    Coordinate Transformation: This is a preprocessing step

    for the curvature estimation. A local coordinate system de-

    fined by the principal directions of the point set (Nnearest

    neighborhood points at every point). The reason why the

    points are being transformed is to fix a reference point with

    its neighborhood and coordinate axis such that every point

    in the new space can be represented as an n-tuple of its co-

    ordinates.

    The least variation can be detected from finding the

    smallest eigenvalue. The eigenvector (vmin) of the small-

    est eigenvalue is then set to be new local Z-axis. This ap-

    proach indicates that the least variation should be observed

    along the surface normal. While the new (local)Z-axis isobtained, the orientation of the axis needs to be verified.

    The orientation is checked against a direction of the aver-

    aged surface normal (n) at the reference point. The anglebetween these two vectors (n, vmin) is examined for the va-

    lidity of the orientation ofvmin. Once the verification is

    completed, the Npoints are transformed into a new local

    affine coordinate as [ui, vi, zi]T = VT(xi xp), where

    V = [vmax, v, vmin], xi is one of the points around thecurrent point xp [22].

    Least Square Fit: This step is to obtain a set of coeffi-

    cients of the quadratic equation for the curvature compu-

    tation. The neighborhood points (Np) being observed at

    the reference point (P) are now transformed into the newcoordinate system in eigenvectors (vmax, v, vmin) of the

    co-variance matrix as [ui, vi, zi]T. The quadratic equation

    requires 6 or more equations to find the six unknown coef-

    ficients:

    z= f(u, v) = a1u2 + a2uv+ a3v

    2 + a4u + a5v+ a6

    Given the coefficients, the partial derivatives can be com-

    puted to obtain theK and H.

    Curvature Estimation: A Monge patch technique is

    used to compute the coefficients of the first and the

    second fundamental forms [23]. A Monge patch can

    be written as a surface of the explicit equation formM(u, v) = (u,v,f (u, v)). The Gaussian (K) and meancurvature (H) are then obtained. Depending on the sign of

    Kand H, an observed local surface patch can be classified

    into one of the eight different shapes [24]. However, when

    an input surface is deformed, it increases intra-variation

    rate and may not be possible for a representation of the

    surface to be recognized uniquely. Three face surfaces

    of the same subject with different expression (deformed

    surface) color-coded for surface classes are shown in Fig.5.

    Local Surface Realization: As threshold values are

    tested during the sign test to determine the surface class

    types, a nose tip is expected to be a peak (K>TK andHTK and

    H

  • 7/27/2019 2005 - Adaptive Rigid Multi-Region Selection for Handling Expression Variation in 3D Face Recognition

    5/8

    [Neutral] [Happy] [Disgusted]

    Figure 5: Images of a single person with different expressions renderedbased on surface types. The regions in many surface types changed as

    the deformation is introduced. As cheeks are lifted shown inhappy and

    disgusted, we can see clearly that peaks are detected at the upper cheeks in

    both sides or in lips.

    a set of predefined implicit functions (See Fig.6).

    Even though we claimed that the ideal regions for face

    matching under varying expressions are areas around the

    nose, parts of the nose still show certain degrees of mus-

    cle movement, (nose bridge/nostril). This problem can be

    resolved by considering multiple local surfaces around the

    general nose area. These are the primary regions of inter-

    est during facial feature findings. This method of finding

    the curvature-based face regions is automated and has been

    evaluated on 4,485 3D face images of 449 people with a va-

    riety of facial expressions. The facial landmarks (eye cav-

    ities, nose tip and nose bridge) were successfully found in

    99.4% of the images (4,458 of 4,485).

    Figure 6: Local surface realization for a gallery and three probe. Inthe relation to the locations of ROIs identified, simple geometry (implicit)

    functions are used to extract the matching regions for gallery and probe.

    2.4. Face Matching in Identification

    Given a pair of surfaces to be matched, the initial regis-

    tration is performed by translating the centroid of the probe

    surface to the centroid of the gallery surface. Iterative align-ment based on point difference between two surfaces is per-

    formed. At the end of each iteration, the RMS difference is

    computed between two surfaces. The iteration halts when

    there is little or no change. Because a probe has 3 local sur-

    faces that need to be matched to a gallery, decision fusion is

    required to combine the three RMS error values for the final

    similarity value (See Fig.7).

    During the decision process of matching each probe to

    one of the gallery entries, some fusion or voting rule must

    be used. We considered the sum rule, minimum rule, and

    product rule. Thesum rule takes the sum of the RMS dif-

    ferences for the three regions from the probe image as the

    probe-to-gallery match value. The minimumrule takes the

    smallest of the RMS difference values. The productrule

    takes the product of the three difference values.

    Figure 7: As three local surfaces are matched against a gallery, differentfusion strategies may be considered to combine results either at the metric

    based or at the rank based

    3. Data Collection

    A total of 546 different subjects participated in one or more

    data acquisition sessions yielding a total of 4,485 3D scans

    used in this study. Among the 546 subjects, 449 partici-

    pated in both a gallery acquisition and at least one or more

    probe acquisition(s). Subjects who only have non-neutral

    expressions are dropped since a gallery image with a neu-

    tral expressions is required.

    There are two classes of probes depending on the expres-

    sion changes beingaskedof the subjects at the time of dataacquisition. The first class consists of 9 probe sets and each

    probe set contains 3D scans acquired under neutral expres-

    sion collected in different weeks. This class has a gallery

    set of the 449 subjects and a total of 2,798 probe images of

    those 449 subjects.

    The second class consists of 8 probe sets and each probe

    set contains 3D scans acquired while subjects wereaskedto

    have different ones of the human expressions described by

    Ekman [25]. The second class has the same gallery as the

    first class and a total of 1,590 probes acquired in later weeks

    of 355 subjects, a subset of the 449 subjects in the gallery.

    The training set, needed only for the PCA method, con-

    tains the 449 gallery images plus an additional 97 imagesfor subjects whom good data was not acquired in both the

    gallery and probe sessions. Thus, this additional 97 images

    are used only to create the facespace for the PCA method.

    4. Experiment

    The methods considered in the experiments are (1) ICP-

    baseline, using manually selected landmark points and the

    5

    oceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05)063-6919/05 $20.00 2005 IEEE

  • 7/27/2019 2005 - Adaptive Rigid Multi-Region Selection for Handling Expression Variation in 3D Face Recognition

    6/8

    Figure 8: Example images in 2D and 3D with different expressions

    Table 2: Statistics Dataset Used in this StudyNeutralexpression sets : 2,798 scans of 449 subjects

    * 1 2 3 4 5 6 7 8 9

    449 449 390 336 286 243 205 172 145 123

    Non-neutralexpression sets : 1,590 scans of 355 subjects

    * 1 2 3 4 5 6 7 8

    355 355 321 266 209 168 125 91 55

    Subjects with 2 or more scans = 449, subjects with only one scan = 97

    Total number of scans : 4,485 scans. * is the gallery set

    whole face (See Section1.2), (2) PCA-baselinewhich uses

    manually selected points and the whole face is matched as

    shown in (See Fig.1), (3) ARMS-autowhich automatically

    finds ROIs and extracts the multiple nose area regions for

    matching and (4)ARMS-manual, using manually selected

    landmark points to extract the multiple nose regions for

    matching.

    Four experiments are conducted to evaluate the recogni-

    tion rates in various situations, such as time-lapse and ex-

    pression variations between a gallery and a probe. The first

    experiment investigates how the recognition performance is

    affected by time-variation only, with no expression change.

    The second experiment evaluates the performance of PCA-

    baseline and two ARMS-based methods when both time

    and expression are varied. In the third experiment, the per-

    formance effects of 3D face recognition methods on the

    number of probes are examined. Finally, the probes are col-

    lected into one single pool in the forth experiment. There

    will be one or more probes for a subject who appears in the

    gallery, with each probe being acquired in a different acqui-

    sition session separated by a week or more.

    4.1. Time Variation Effects on Performance

    This experiment evaluates the performance across 9 probe

    sets. Probe set #2 has a greater elapsed time betweengallery and probe image acquisition than probe set #1, and

    so on. The results are shown in Fig. 9. PCA-baseline

    has an average 77.7% rank-one recognition rate. Both

    ARMS-based methods combining probe #1 and probe #3

    (as shown in Fig. 6) with the product rule performed

    higher than our baseline methods yielding rank-one recog-

    nition rate of 96.6% by ARMS-auto and 96.1% by ARMS-

    manual. These results show that (1) both ARMS-based

    Probe#1 Probe#2 Probe#3 Probe#4 Probe#5 Probe#6 Probe#7 Probe#8 Probe#9

    Probes with Neutral Expression

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Rank-OneRate

    ICP-baselineARMS-autoARMS-manual

    Figure 9: Rank-one recognition with same neutral expression as gallery.A number in the parentheses is the number of probe images. The sumrule

    obtained 96.6% by ARMS-auto, and theminimumrule obtained 96.5%.

    methods outperform the PCA-baseline in neutral expres-

    sion, (2) there is marginal performance difference between

    automated and manual ARMS methods, reflecting that our

    automated 3D facial feature finding method is 99.4% suc-

    cessful, (3) the rank-one recognition rates were maintainedsurprisingly well as the elapsed time between gallery and

    probe increases, and (4) the two algorithms ARMS-manual

    and ARMS-auto differ only in ARMS-manual using manu-

    ally selected landmark points to initialize the ICP matching

    and ARMS-auto being totally automated.

    4.2. Effects of Expression Variation

    This experiment examines our ARMS-based methods and

    PCA-baseline when subjects have different expressions in

    their gallery and probe images. This has the same exper-

    imental design as the previous one except that there are 8

    probe sets. As shown in Fig. 10, our proposed ARMS-based

    Probe#1 Probe#2 Probe#3 Probe#4 Probe#5 Probe#6 Probe#7 Probe#8

    Probes with Non-Neutral Expression

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Rank-OneRate

    ICP-baselineARMS-autoARMS-manual

    Figure 10: Rank-one results with different expressions. The sum ruleobtained 86.8% by ARMS-auto, and theminimumrule obtained 82.9%.

    methods (auto and manual) clearly outperform the PCA-

    baseline method. The average rank-one match rate is 61.3%

    for PCA-baseline, while ARMS-auto achieves 87.1% with

    the product rule on average (88.6% by ARMS-manual). We

    found that (1) our ARMS-based methods also have better

    identification accuracy than ICP-baseline and PCA-baseline

    6

    oceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05)063-6919/05 $20.00 2005 IEEE

  • 7/27/2019 2005 - Adaptive Rigid Multi-Region Selection for Handling Expression Variation in 3D Face Recognition

    7/8

    Probe #1(449) Probe #1(225) Probe #1(113) Probe #1(57) Probe #1(29)

    5 Different Sizes of the "Probe #1" Set with Neutral Expression

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Rank-OneRate

    ARMS-autoARMS-manual

    Probe #1(355) Probe #1(178) Probe #1(89) Probe #1(45) Probe #1(23)

    5 Different Sizes of the "Probe #1" Set with Non-Neutral Expression

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Rank-OneRate

    ARMS-autoARMS-manual

    Figure 11: Rank-one identification rates obtained to probes in differentsizes. Probes with neutral expression (top), Probes with non-neutral ex-

    pression (bottom). The error bars in each probe (except the first probe set)

    show the standard error of rank-one match rates in each probe set.

    in varying expressions, (2) expression changes do cause

    performance to deteriorate in all methods, (3) the rank-one

    rates of shape are more consistent than 2D rates in the func-

    tion of time changes reported in other study [7]. This is

    observed in both neutral and non-neutral expressions.

    4.3. Scalability of 3D Face Recognition

    The results in this experiment provide the performance ef-

    fects on thenumber of subjects enrolled in a probe set. We

    begin with probe set #1 for the neutral expression class and

    the varying expression class. We randomly select one half

    of probe set #1 to generate a reduced-size probe set, and

    do this multiple times. For instance, probe set #1 in the

    neutral-expression class has 449 subjects. The second probe

    set in this experiment has 225 of the 449 subjects. In or-

    der to show performance variance caused by subject pool,

    we generated 10 sets from randomly selected 225 subjects,and computed the error rates of each probe. Then, the third

    probe set consists of 10 sets of randomly selected 113 of

    the 449 subjects and so on. Each class has five probe sets,

    and rank-one rates are plotted in Fig.11. Our results indi-

    cate that there is a tendency for higher performance rates as

    the probe size decreases and they coincide with evaluation

    results reported in the FRVT [7]. Also, this performance

    behavior is more prominent when expressions are varied.

    4.4. Multiple Probe Study

    This experiment models the situation where a person is en-

    rolled in the system with one neutral expression image and

    then attempts are made to recognize the person at various

    later points in time based on the image acquired at that time,

    with possibly varying facial expression. For the multiple

    probe study, the gallery images are acquired in the first weekand all the probes acquired in later weeks are collected into

    a single pool, yielding 3,939 (2,349+1,590)probes (See Ta-ble 2 for dataset statistics). Then, correct or incorrect match

    is recorded for each probe. Performance rate is reported as

    an average of correct match rate by all subjects. (See Ta-

    ble 3). ARMS-auto and ARMS-manual methods achieved

    the similar performance rates and show higher performance

    accuracy than methods that use whole face in matching.

    Table 3: Multiple Probe Study

    PCA-baseline ICP-baseline ARMS-auto ARMS-manual

    70.7% 78.1% 91.9% 92.3%

    5. Conclusions and Discussion

    Our baseline experiments involve the evaluation of PCA

    or ICP based algorithms similar to ones recently reported

    in the literature [11, 1, 3, 9]. Using a PCA-based algo-

    rithm and an ICP-baseline method, each with manually-

    selected landmark points, gives results that show better per-

    formance for the ICP-based approach. These algorithms use

    the whole frontal face area in matching.

    Our results further show that in the presence of expres-sion change, the recognition performance of these baseline

    algorithms drops dramatically. This is because they treat the

    whole frontal face region as a rigid surface, and of course

    the face as a whole is not rigid over expression change.

    The proposed algorithm focuses on the general nose re-

    gion area of the face, as being the most rigid across expres-

    sion change. We actually use three different shape nose re-

    gions, to allow for the fact that the whole nose area region

    is not perfectly rigid across expression change. Later, all

    the three regions are matched independently from probe to

    gallery, and the three match values combined to recognize

    the probe identity. Among the fusion rules considered, the

    product rule shows slightly but consistently higher accuracythan the sum rule or the minimum rule.

    We evaluate the new algorithm once using the manually

    selected points to initialize the algorithm, and a second time

    as a fully-automated algorithm not using any manually se-

    lected points. The fully automated version has only slightly

    lower performance than the version using manually selected

    points. This reflects the overall 97.5% accuracy in auto-

    mated point selection. Also, the new algorithm matching

    7

    oceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05)063-6919/05 $20.00 2005 IEEE

  • 7/27/2019 2005 - Adaptive Rigid Multi-Region Selection for Handling Expression Variation in 3D Face Recognition

    8/8

    in general nose regions shows better performance rate than

    PCA-baseline. This shows our method has higher discrimi-

    natory power. When expression is varied in subjects at dif-

    ferent times, the recognition rate does not degrade as much

    as PCA-baseline or ICP-baseline using whole face region.

    One surprising element of this work is that we can

    achieve such good performance using a small portion ofwhole face by the ARMS-based method. However, the fore-

    head regionis sometimes obscured by hair (See Fig. 12) and

    also has furrows in some expressions. And our 3D images

    are frontal views, making the cheekbone regions (located in

    the edge-on silhouette) difficult to use. Since the mouth and

    lips are varied greatly with expressions, the nose area be-

    comes the logical remaining choice for matching.

    Gallery Correctly Incorrectly

    matched matched

    probe probe

    Figure 12: One of the cases when localized face regions for matchingwould be more advantageous than when whole face is used for matching is

    face occlusion. This subject was incorrectly identified with the PCA-based

    method but was successfully matched by our new ARMS-based method.

    Problems like facial occlusion due to hair or mustache can be resolved by

    a local region based matching. (Intensity image is shown for illustration

    purposes only)

    Even though the time it takes to complete the ICP surface

    matching ranges between 0.4 to 0.7 seconds in about 30 it-

    erations, preprocessing steps (data cleanup, skin detection,facial feature extraction and ROI extraction) take approx-

    imately 2 to 3 seconds to complete. Making 3D process-

    ing computationally efficient might be a challenging task.

    One way to improve face matching is to apply a spatial

    search technique using a specialized data structure for the

    ICP method [26].

    The dataset used in the experiments reported here will

    be made available to other research groups as a part of the

    HID databases. See http://www.nd.edu/ cvrl for additional

    information.

    References[1] C. Hesher, A. Srivastava, and G. Erlebacher, A novel technique for face recog-

    nition using range imaging, Seventh International Symposium on Signal Pro-

    cessing and its Applications, 2003.

    [2] Gaile Gordon, Face recognition based on depth and curvature features, Com-

    puter Vision and Pattern Recognition (CVPR) , pp. 108110, June 1992.

    [3] G. Medioni and R. Waupotitsch, Face modeling and recognition in 3-D,IEEE

    International Workshop on Analysis and Modeling of Faces and Gestures, pp.232233, October 2003.

    [4] G. Givens, R. Beveridge, B. Draper, and D. Bolme, A statistical assessment ofsubject factors in the pca recognition of human faces, Workshop on Statistical

    Analysis in Computer Vision (in CVPR), 2003.

    [5] K. Chang, K. Bowyer, and P. Flynn, An evaluation of multi-modal 2D+3D

    face biometrics, IEEE Transactions on Pattern Analysis and Machine Intelli-

    gence, 2004.

    [6] K. Bowyer, K. Chang, and P. Flynn, A short survey of 3D and multi-modal

    3D+2D face recognition, International Conference on Pattern Recognition,

    UK, 2004.

    [7] J. Phillips, P. Grother, R. Micheals, D. Blackburn, E. Tabassi, and M. Bone,Facial recognition vendor test 2002: Evaluation report, available at

    http://www.frvt2002.org/FRVT2002/documents.htm.

    [8] D.Colbry, X. Lu, A. Jain, and G. Stockman, 3D face feature extraction for

    recognition,Michigan State Univ. Tech. Report, 2004.

    [9] X. Lu, D.Colbry, and A. Jain, Three-dimensional model based face recogni-

    tion, International Conference on Pattern Recognition, pp. 362366, 2004.

    [10] T. Papatheodorou and D. Reuckert, Evaluation of automatic 4D face recogni-

    tion using surface and texture registration, Sixth International Conference on

    Automated Face and Gesture Recognition, pp. 321326, 2004.

    [11] K. Chang, K. Bowyer, and P. Flynn, Face recognition using 2D and 3D facial

    data, ACM Workshop on Multimodal User Authentication , pp. 2532, Decem-ber 2003.

    [12] G. Pan, Z. Wu, and Y. Pan, Automated 3D face verification from range data,International Conference on Acoustics, Speech and Signal Processing, pp. 192

    196, 2003.

    [13] A. Bronstein, M. Bronstein, and R. Kimmel, Expression-invariant 3D face

    recognition, 4th International Conference Audio and Video based Biometric

    Person Authetication, pp. 6270, 2003.

    [14] F. Tsalakanidou, D. Tzovaras, and M.Strintzis, Useof depth and coloureigen-

    faces for face recognition, Pattern Recognition Letters, pp. 14271435, 2003.

    [15] T. Russ, K. Koch, and C. Little, 3D face recognition: a quantitative analysis,45-th Annual Meeting of the Institute of Nuclear Materials Management, 2004.

    [16] A. Moreno, A. Sanchez, J. Velez, and J. Diaz, Face recognition using 3d

    surface-extracted descriptors, Irish Machine Vision and Image ProcessingConference, September.

    [17] Y. Lee, K. Park, J. Shim, and T. Yi, 3D face recognition using multiple fea-

    tures for the local depth information, 16th International Conference on VisionInterface, Halifax, Canada, June 2003.

    [18] G. Gordon, Face recognition based on depth maps and surface curvature,SPIE Geometric Methods in Computer Vision, San Diego CA., vol. 1570, 1991.

    [19] T. Nagamine, T. Uemura, and I. Masuda, 3d facial image analysis for human

    identification, pp. 324327, 1992.

    [20] C. Chua, F. Han, and Y. Ho, 3D human face recognition using point signature,Intl Conf. on Automatic Face and Gesture Recognition, pp. 233238, 2000.

    [21] C. Poynton, A Technical Introduction to Digital Video, John Wiley & Sons,

    New York, 1996.

    [22] P.J. Flynn K.L. Boyer, R. Srikantiah, Saliency sequential surface organization

    for free-form object recognition, Computer Vision Image Understanding, pp.152188, 2002.

    [23] P.J. Flynn and A.K. Jain, 3dobject recognition usinginvariant feature indexing

    of interpretation tables, Computer Vision, Graphics Image Processing, , no.55, pp. 119129, 1992.

    [24] P.J. Besl and N.D. McKay, A method for registration of 3-D shapes, IEEE

    Transactions on Pattern Analysis and Machine Intelligence , vol. 14, no. 2, pp.239256, Feb. 1992.

    [25] P. Ekman, Basic emotions,Handbook of Cognition and Emotion, pp. 4560,

    1999.

    [26] M. Greenspan and G. Godin, A neareast neighbor method for efficient ICP,Third International Conferenceon 3-D Digital Imaging and Modeling, pp.161168, 2001.

    8

    oceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05)063-6919/05 $20.00 2005 IEEE