facial expression analysis - aalborg universitet · facial expression analysis jeff cohn ......

1Facial ExpressionAnalysis F. De la Torre/J. Cohn Looking @ People (CVPR-12)

Facial Expression Analysis

Jeff Cohn

Fernando De la Torre

Tutorial Looking @ People

June 2012

Human Sensing

Laboratory

2

Outline

• Introduction

• Facial Action Coding System (FACS)

– Discrete vs. dimensional approaches

• Applications of FEA

• Databases

• Algorithms

– Supervised

– Unsupervised

• Conclusions and open problems

3

Outline

• Introduction


– Discrete vs. dimensional approaches


• Databases

• Algorithms

– Supervised

– Unsupervised

• Conclusions and open problems

Supervised Facial Expression Analysis (FEA)

• Most work on FEA has been supervised using

different registration, features and classifiers.

Supervised FEA (II)

2D/3D Face

tracking (AAM)

Classifiers (identity,

discriminate Aus)

Features (illumination,

identity)

Registration (remove 3D

rigid motion)

AU present?

• Generative (Parameterized Appearance Models)– Active Appearance models (e.g., Cootes et al. 98, Romdhani et al. 99, De la Torre 00,

Matthews & Baker 05, De la Torre & Nguyen 08, Gong et al. 00)

– Eigentracking (e.g., Black & Jepson 98)

– Morphable models (e.g., Jones & Poggio 98, Blanz & Better 99)

• Discriminative– Regression:

• Classifier fitting (e.g., Liu 09)

• Continuous regression (e.g., Sauer et al. 11, Saragih 11)

• Cascaded regression (e.g., Dollar et al. 10, Cao et al. 12)

– Local models:• Constrained Local model (e.g., Cristanace & Cootes 08, Lucey et al. 09, Saragih et al. 10)

• Part-based model (Zhu & Ramanan 12)

Facial feature detection

Procrustes

Shape modes

Shape normalised images

Hand-Labeled

Training Data

Appearance modes

B1 B2B0

Parameterized Appearance Models

8

Detection as an Optimization Problem

)),0 a(f(xs...a))(f(x,s n++

Learned off-line

aa

•Translation

rotation, scale

sc0

scn

•Non-rigid

parameters

0.7 0.2

2

2

)( Bcd −

B1 B2B0

•Appearance

parameters

c

Problems• Prone to local minima

• Not generalize well (e.g., different people)

(Nguyen & De la Torre 10)

10

Discriminative models

)( 11

0)∆aa,)∆cS(cf(

0 ++ Sfeatures

Rigid

parameters

Non-rigid

parameters

0.7

[ ]s1

1 ca ∆∆

[ ]S2

2 ca ∆∆ )( 2

02 )∆aa,)∆cS(cf(

0 ++ sfeatures

• In general improve generalization(e.g., Liu 09, Sauer et al. 11, Saragih 11, Dollar et al. 2010, Cao et al. 2012)

Discriminative models (II)• Local discriminative models

• Constrained Local model (e.g., Cristanace & Cootes 08, Lucey et al. 09, Saragih

et al. 10)

• Part-based model (Zhu & Ramanan 2012)

Thanks Saragih/Lucey

Face registration• What are the three most important aspects of face recognition?

• Similarity registration (e.g., Barlett et al. 05, Whitehill et al. 11)

• Piece-wise warping (e.g., Cootes et al. 98, Gong et al. 00, Tong et al. 07, De la Torre &

Nyugen 08, Jones & Poggio, 1998, Lucey et al. 09, Saragih et al. 10)

Rotate,

scale

Piece-wise

warping

Benefits:

- Subtle AUs

- Out-of plane

rotation (3D

models)

“registration, registration, registration” (Takeo Kanade ‘90)

• 3D registration (Thanks Laszlo Jeni)

Face Registration



Supervised FEA (II)

Face tracking

(AAM)

Classifiers Features

Registration

AU present?

Features• Three types: (1) Shape, (2) Appearance, (3)

Temporal features.

(e.g., Sebe et al. 07, Asthana et al. 09,

Lucey et al., 2007; Chew et al., 2011, Zhou

et al. 10; Valstar et al. 12)

(e.g., Zhou et al. 2010)

• Shape features

Appearance featuresRaw pixels

Gabor bank

Box filtersSIFT/HOG

Local binary patterns

(e.g., Kanade et al., 2000)

(e.g., Donato et al., 99;

Barlett 04, Littlewort et al.,

2006, Whitehill et al. 11)(e.g., Shan et al 09, Zhao et

al., 10 Jiang et al. 11)

(e.g., Whitehill & Omlin, 2006)(e.g., Zhu et al. 2011,

Simon et al., 2010, Dhall et al. 11)

NMF

(e.g., Zhi et al. 11,

Zafeiriou and Petrou 10)

• Warning!!: Appearance features typically need

dimensionality reduction and/or feature selection

Temporal features

Motion units/trajectories

(e.g., Cohen et al., 02, Li et al. 01)

Optical flow

(e.g., Essa and Pentland 97, Gunes and Piccardi 05)

Motion history

(e.g., Valstar et al., 04,

Koelstra et al. 10)

Bag of temporal words

(e.g., Simon et al., 10)



Supervised FEA (II)

Face tracking

(AAM)


Registration

AU present?

Classifiers• Static

– Exemplar + GMM (Wen and Huang, 2003)

– Neural Network (Kapoor and Picard, 2005)

– SVM/Adaboost (Bartlett et al., 2005)

– Linear Discriminant Classifiers (Wang et al., 2006)

– Gaussian Process (Chen et al., 2009)

– Boosting (Shan et al. 2006 , Zhu et al. 2010)

• Dynamic

– Hidden Markov models (Lien et al, 2000)

– Dynamic Bayesian Network (Tong et al., 2007)

– Conditional random field (Chang and Liu, 2009)

– Temporal Bag of Words (Simon et al. 2010)

The million $ question

• Which is the best feature and classifier?

• Data

– Have access to reliable and well annotated data

– The more data the better

• Features

– It is AU dependent

– In general feature fusion is the best (e.g., multiple kernel

learning)

• Classifier

– Depends on the amount of training data

– How familiar you are with the classifier



Supervised FEA (II)

Face tracking

(AAM)


Registration

AU present?



Sample selection

… … … …

Onset OffsetPeak

Time

Intensity

AU

+ --• Make good use of the data!!! (Zhu et al 11, Simon 10)

The first number between lines | denotes the area under the ROC, the second number is the size of positive samples in the testing dataset and separated by / is the size of negative samples in the testing dataset. The third number denotes the size of positive samples in training working sets and separated by / the total frames of target AU in training data sets.

Results for AU4 and AU12

Bayesian networks• Bayesian networks to model spatial and temporal

relationships among different Aus (Tong et al. 05, Shang et al 07).

25

Outline• Introduction


• Databases


• Algorithms

– Supervised

– Unsupervised

• Conclusions and future work

Motivation

• Mining facial expression for one subject


Motivation


• Summarization

• Visualization

• Indexing


Looking up Sleeping SmilingLooking

forwardWaking up

Motivation

• Summarization

• Visualization

• Indexing

• Mining facial expression of one subject

Motivation

• Summarization

• Embedding

• Indexing

• Summarization

• Embedding

• Indexing

Mining facial expression across subjects

RU-FACS database (Bartlett et al. ’06)

Aligned Cluster Analysis

3hStart and end of the segments (h)

mh 1+mh1h 2hLabels (G)

(Zhou et a. ‘10)

=MGxy

Kernel k-means and spectral clustering(Ding et al. ‘02, Dhillon et al. ‘04, Zass and Shashua ‘05, De la Torre ‘06)

2||||),( FJ MGXGM −=

1

2

3

4

5

6

7

8

9

10x

y

=G

xy=X

)))((()( 1

n GGGGIKG−−= TTtrJ

)(ϕ

)()( XXK ϕϕ T=

=Mxy

=G

2)(),,(

FacaJ MGXGM −= ϕ

Problem formulation for ACA

H )..[)..[)..[ 13221,...,,

+mm hhhhhh XXX

)..[ 21 hhX )..[ 32 hhX )..[ 1+mm hhX

Labels (G)3h

Start and end of the segments (h)mh 1+mh

Dynamic Time Alignment Kernel (Shimodaira et al. 01)

1h 2h 4h

2)(),,(

FkkmJ MGXGM −= ϕ

Matrix formulation for ACA

GGGGILKL1

n )(with)( −−== TT

kmk trJ

samples

seg

me

nts

}{ 2371,0

×∈H

GHGGGHILWLK1

n )(with))o(( −−== TTT

aca trJ

2323×∈RW

clu

ste

rs

segments

}{ 731,0

×∈G

)()( XXK ϕϕ T=

Dynamic Time

Alignment Kernel

(Shimodaira et al. 01)

23 frames, 3 clusters

Facial image features

Appearance

• Active Appearance Models (Baker and Matthews ‘04)

Upper

face

Lower

face

Shape• Image features

• Cohn-Kanade: 30 people and five different expressions

(surprise, joy, sadness, fear, anger)

Facial event discovery across subjects

ACA Spectral

Clustering (SC)

0.87(.05) 0.56(.04)

• Cohn-Kanade: 30 people and five different expressions

(surprise, joy, sadness, fear, anger)

Facial event discovery across subjects

• 10 sets of 30 people

Unsupervised facial event discovery

FACS coding

Nose Wrinkler (AU9)Upper lid raiser (AU5) Lip Tightener (AU23)Outer Brow Raiser (AU2)

ACA Spectral Clustering

(SC)

Lower face 0.53(.09) 0.39(0.14)

Upper face 0.69(.12) 0.47(0.12)

Conclusions and open problems

• Supervised and unsupervised algorithms for FEA

• Tracking/registration

– Registration, registration, registration… changes in pose (3D models)

– Robustness to occlusion

• Features

– Subtle facial expressions

– Dynamics (e.g., temporal envelope)

• Classifiers

– Expression intensity

– Individual differences

– Predicting onset/offset

– Truly multi-class AU detection

Conclusions and open problems• Data

– Attention to reliability of ground truth

– Shared, well-annotated video

– Innovative ways to use video that cannot be shared

• Segmentation and timing

– Intra-personal

– Interpersonal

• User-in-loop approaches

– User-assisted coding, e.g., Fast FACS (e.g., Simon et al., 2011)

– Combining manual and automated measurement (e.g. Ambadar

et al. 2009)

– Person-dependent classifiers

• Other issues

– Multimodal

– Context

43Facial ExpressionAnalysis F. De la Torre/J. Cohn Looking @ People (CVPR-12)

Questions?

Jeff Cohn

Fernando De la Torre

Tutorial Looking @ People

June 2012

Human Sensing

Laboratory

facial expression analysis - aalborg universitet · facial expression analysis jeff cohn ......

Documents