facial expression analysis - aalborg universitet · facial expression analysis jeff cohn ......
TRANSCRIPT
1Facial ExpressionAnalysis F. De la Torre/J. Cohn Looking @ People (CVPR-12)
Facial Expression Analysis
Jeff Cohn
Fernando De la Torre
Tutorial Looking @ People
June 2012
Human Sensing
Laboratory
2
Outline
• Introduction
• Facial Action Coding System (FACS)
– Discrete vs. dimensional approaches
• Applications of FEA
• Databases
• Algorithms
– Supervised
– Unsupervised
• Conclusions and open problems
3
Outline
• Introduction
• Facial Action Coding System (FACS)
– Discrete vs. dimensional approaches
• Applications of FEA
• Databases
• Algorithms
– Supervised
– Unsupervised
• Conclusions and open problems
Supervised Facial Expression Analysis (FEA)
• Most work on FEA has been supervised using
different registration, features and classifiers.
Supervised FEA (II)
2D/3D Face
tracking (AAM)
Classifiers (identity,
discriminate Aus)
Features (illumination,
identity)
Registration (remove 3D
rigid motion)
AU present?
• Generative (Parameterized Appearance Models)– Active Appearance models (e.g., Cootes et al. 98, Romdhani et al. 99, De la Torre 00,
Matthews & Baker 05, De la Torre & Nguyen 08, Gong et al. 00)
– Eigentracking (e.g., Black & Jepson 98)
– Morphable models (e.g., Jones & Poggio 98, Blanz & Better 99)
• Discriminative– Regression:
• Classifier fitting (e.g., Liu 09)
• Continuous regression (e.g., Sauer et al. 11, Saragih 11)
• Cascaded regression (e.g., Dollar et al. 10, Cao et al. 12)
– Local models:• Constrained Local model (e.g., Cristanace & Cootes 08, Lucey et al. 09, Saragih et al. 10)
• Part-based model (Zhu & Ramanan 12)
Facial feature detection
Procrustes
Shape modes
Shape normalised images
Hand-Labeled
Training Data
Appearance modes
B1 B2B0
Parameterized Appearance Models
8
Detection as an Optimization Problem
)),0 a(f(xs...a))(f(x,s n++
Learned off-line
aa
•Translation
rotation, scale
sc0
scn
•Non-rigid
parameters
0.7 0.2
2
2
)( Bcd −
B1 B2B0
•Appearance
parameters
c
Problems• Prone to local minima
• Not generalize well (e.g., different people)
(Nguyen & De la Torre 10)
10
Discriminative models
)( 11
0)∆aa,)∆cS(cf(
0 ++ Sfeatures
Rigid
parameters
Non-rigid
parameters
0.7
[ ]s1
1 ca ∆∆
[ ]S2
2 ca ∆∆ )( 2
02 )∆aa,)∆cS(cf(
0 ++ sfeatures
• In general improve generalization(e.g., Liu 09, Sauer et al. 11, Saragih 11, Dollar et al. 2010, Cao et al. 2012)
Discriminative models (II)• Local discriminative models
• Constrained Local model (e.g., Cristanace & Cootes 08, Lucey et al. 09, Saragih
et al. 10)
• Part-based model (Zhu & Ramanan 2012)
Thanks Saragih/Lucey
Face registration• What are the three most important aspects of face recognition?
• Similarity registration (e.g., Barlett et al. 05, Whitehill et al. 11)
• Piece-wise warping (e.g., Cootes et al. 98, Gong et al. 00, Tong et al. 07, De la Torre &
Nyugen 08, Jones & Poggio, 1998, Lucey et al. 09, Saragih et al. 10)
Rotate,
scale
Piece-wise
warping
Benefits:
- Subtle AUs
- Out-of plane
rotation (3D
models)
“registration, registration, registration” (Takeo Kanade ‘90)
• 3D registration (Thanks Laszlo Jeni)
Face Registration
• Most work on FEA has been supervised using
different registration, features and classifiers.
Supervised FEA (II)
Face tracking
(AAM)
Classifiers Features
Registration
AU present?
Features• Three types: (1) Shape, (2) Appearance, (3)
Temporal features.
(e.g., Sebe et al. 07, Asthana et al. 09,
Lucey et al., 2007; Chew et al., 2011, Zhou
et al. 10; Valstar et al. 12)
(e.g., Zhou et al. 2010)
• Shape features
Appearance featuresRaw pixels
Gabor bank
Box filtersSIFT/HOG
Local binary patterns
(e.g., Kanade et al., 2000)
(e.g., Donato et al., 99;
Barlett 04, Littlewort et al.,
2006, Whitehill et al. 11)(e.g., Shan et al 09, Zhao et
al., 10 Jiang et al. 11)
(e.g., Whitehill & Omlin, 2006)(e.g., Zhu et al. 2011,
Simon et al., 2010, Dhall et al. 11)
NMF
(e.g., Zhi et al. 11,
Zafeiriou and Petrou 10)
• Warning!!: Appearance features typically need
dimensionality reduction and/or feature selection
Temporal features
Motion units/trajectories
(e.g., Cohen et al., 02, Li et al. 01)
Optical flow
(e.g., Essa and Pentland 97, Gunes and Piccardi 05)
Motion history
(e.g., Valstar et al., 04,
Koelstra et al. 10)
Bag of temporal words
(e.g., Simon et al., 10)
• Most work on FEA has been supervised using
different registration, features and classifiers.
Supervised FEA (II)
Face tracking
(AAM)
Classifiers Features
Registration
AU present?
Classifiers• Static
– Exemplar + GMM (Wen and Huang, 2003)
– Neural Network (Kapoor and Picard, 2005)
– SVM/Adaboost (Bartlett et al., 2005)
– Linear Discriminant Classifiers (Wang et al., 2006)
– Gaussian Process (Chen et al., 2009)
– Boosting (Shan et al. 2006 , Zhu et al. 2010)
• Dynamic
– Hidden Markov models (Lien et al, 2000)
– Dynamic Bayesian Network (Tong et al., 2007)
– Conditional random field (Chang and Liu, 2009)
– Temporal Bag of Words (Simon et al. 2010)
The million $ question
• Which is the best feature and classifier?
• Data
– Have access to reliable and well annotated data
– The more data the better
• Features
– It is AU dependent
– In general feature fusion is the best (e.g., multiple kernel
learning)
• Classifier
– Depends on the amount of training data
– How familiar you are with the classifier
• Most work on FEA has been supervised using
different registration, features and classifiers.
Supervised FEA (II)
Face tracking
(AAM)
Classifiers Features
Registration
AU present?
• Most work on FEA has been supervised using
different registration, features and classifiers.
Sample selection
… … … …
Onset OffsetPeak
Time
Intensity
AU
+ --• Make good use of the data!!! (Zhu et al 11, Simon 10)
The first number between lines | denotes the area under the ROC, the second number is the size of positive samples in the testing dataset and separated by / is the size of negative samples in the testing dataset. The third number denotes the size of positive samples in training working sets and separated by / the total frames of target AU in training data sets.
Results for AU4 and AU12
Bayesian networks• Bayesian networks to model spatial and temporal
relationships among different Aus (Tong et al. 05, Shang et al 07).
25
Outline• Introduction
• Facial Action Coding System (FACS)
• Databases
• Applications of FEA
• Algorithms
– Supervised
– Unsupervised
• Conclusions and future work
Motivation
• Mining facial expression for one subject
• Mining facial expression for one subject
Motivation
• Mining facial expression for one subject
• Summarization
• Visualization
• Indexing
• Mining facial expression for one subject
Looking up Sleeping SmilingLooking
forwardWaking up
Motivation
• Summarization
• Visualization
• Indexing
• Mining facial expression of one subject
Motivation
• Summarization
• Embedding
• Indexing
• Summarization
• Embedding
• Indexing
Mining facial expression across subjects
RU-FACS database (Bartlett et al. ’06)
Aligned Cluster Analysis
3hStart and end of the segments (h)
mh 1+mh1h 2hLabels (G)
(Zhou et a. ‘10)
=MGxy
Kernel k-means and spectral clustering(Ding et al. ‘02, Dhillon et al. ‘04, Zass and Shashua ‘05, De la Torre ‘06)
2||||),( FJ MGXGM −=
1
2
3
4
5
6
7
8
9
10x
y
=G
xy=X
)))((()( 1
n GGGGIKG−−= TTtrJ
)(ϕ
)()( XXK ϕϕ T=
=Mxy
=G
2)(),,(
FacaJ MGXGM −= ϕ
Problem formulation for ACA
H )..[)..[)..[ 13221,...,,
+mm hhhhhh XXX
)..[ 21 hhX )..[ 32 hhX )..[ 1+mm hhX
Labels (G)3h
Start and end of the segments (h)mh 1+mh
Dynamic Time Alignment Kernel (Shimodaira et al. 01)
1h 2h 4h
2)(),,(
FkkmJ MGXGM −= ϕ
Matrix formulation for ACA
GGGGILKL1
n )(with)( −−== TT
kmk trJ
samples
seg
me
nts
}{ 2371,0
×∈H
GHGGGHILWLK1
n )(with))o(( −−== TTT
aca trJ
2323×∈RW
clu
ste
rs
segments
}{ 731,0
×∈G
)()( XXK ϕϕ T=
Dynamic Time
Alignment Kernel
(Shimodaira et al. 01)
23 frames, 3 clusters
Facial image features
Appearance
• Active Appearance Models (Baker and Matthews ‘04)
Upper
face
Lower
face
Shape• Image features
• Cohn-Kanade: 30 people and five different expressions
(surprise, joy, sadness, fear, anger)
Facial event discovery across subjects
• Cohn-Kanade: 30 people and five different expressions
(surprise, joy, sadness, fear, anger)
Facial event discovery across subjects
ACA Spectral
Clustering (SC)
0.87(.05) 0.56(.04)
• Cohn-Kanade: 30 people and five different expressions
(surprise, joy, sadness, fear, anger)
Facial event discovery across subjects
• 10 sets of 30 people
Unsupervised facial event discovery
FACS coding
Nose Wrinkler (AU9)Upper lid raiser (AU5) Lip Tightener (AU23)Outer Brow Raiser (AU2)
ACA Spectral Clustering
(SC)
Lower face 0.53(.09) 0.39(0.14)
Upper face 0.69(.12) 0.47(0.12)
Conclusions and open problems
• Supervised and unsupervised algorithms for FEA
• Tracking/registration
– Registration, registration, registration… changes in pose (3D models)
– Robustness to occlusion
• Features
– Subtle facial expressions
– Dynamics (e.g., temporal envelope)
• Classifiers
– Expression intensity
– Individual differences
– Predicting onset/offset
– Truly multi-class AU detection
Conclusions and open problems• Data
– Attention to reliability of ground truth
– Shared, well-annotated video
– Innovative ways to use video that cannot be shared
• Segmentation and timing
– Intra-personal
– Interpersonal
• User-in-loop approaches
– User-assisted coding, e.g., Fast FACS (e.g., Simon et al., 2011)
– Combining manual and automated measurement (e.g. Ambadar
et al. 2009)
– Person-dependent classifiers
• Other issues
– Multimodal
– Context
43Facial ExpressionAnalysis F. De la Torre/J. Cohn Looking @ People (CVPR-12)
Questions?
Jeff Cohn
Fernando De la Torre
Tutorial Looking @ People
June 2012
Human Sensing
Laboratory