face alignment by explicit shape regression
DESCRIPTION
Face Alignment by Explicit Shape Regression . Xudong Cao Yichen Wei Fang Wen Jian Sun. Visual Computing Group Microsoft Research Asia. Problem: face shape estimation. Find semantic facial points Crucial for: Recognition Modeling Tracking Animation Editing. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/1.jpg)
Face Alignment by Explicit Shape Regression
Xudong Cao Yichen Wei
Fang Wen Jian Sun
Visual Computing GroupMicrosoft Research Asia
![Page 2: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/2.jpg)
Problem: face shape estimation• Find semantic
facial points
• Crucial for:– Recognition– Modeling– Tracking– Animation– Editing
![Page 3: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/3.jpg)
Desirable properties1. Robust
– complex appearance– rough initialization
2. Accurate– error:
3. Efficient occlusion
pose
lighting
expression
training: minutes / testing: milliseconds
: ground truth shape
![Page 4: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/4.jpg)
Previous approaches
• Active Shape Model (ASM)– detect points from local features– sensitive to noise
• Active Appearance Model (AAM)– sensitive to initialization– fragile to appearance change
[Cootes et. al. 1998][Matthews et. al. 2004]...
[Cootes et. al. 1992][Milborrow et. al. 2008]…
All use a parametric (PCA) shape model
![Page 5: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/5.jpg)
Previous approaches: cont.
• Boosted regression for face alignment– predict model parameters; fast– [Saragih et. al. 2007] (AAM)– [Sauer et. al. 2011] (AAM)– [Cristinacce et. al. 2007] (ASM)
• Cascaded pose regression– [Dollar et. al. 2010]– pose indexed feature– also use parametric pose model
![Page 6: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/6.jpg)
Parametric shape model is dominant
• But, it has drawbacks1. Parameter error alignment error
– minimizing parameter error is suboptimal2. Hard to specify model capacity
– usually heuristic and fixed, e.g., PCA dim– not flexible for an iterative alignment
• strict initially? flexible finally?
![Page 7: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/7.jpg)
Can we discard a parametric model?1. Directly estimate shape by regression?
2. Overcome the challenges?– high-dimensional output– highly non-linear– large variations in facial appearance– large training data and feature space
3. Still preserve the shape constraint?
Yes
Yes
Yes
![Page 8: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/8.jpg)
Our approach: Explicit Shape Regression1. Directly estimate shape by regression?
– boosted (cascade) regression framework– minimize from coarse to fine
2. Overcome the challenges?– two level cascade for better convergence– efficient and effective features– fast correlation based feature selection
3. Still preserve shape constraint?– automatic and adaptive shape constraint
Yes
Yes
Yes
![Page 9: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/9.jpg)
Approach overviewt = 0 t = 1 t = 2 … t = 10
Regressor updates previous shape incrementally
: ground truth shape residual
, over all training examples
: image
initialized from face detector
𝑆𝑡−1 ¿𝑆𝑡
affine transform
transformback
…
![Page 10: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/10.jpg)
![Page 11: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/11.jpg)
Regressor learning
𝑅𝑡𝑆0
𝑅1 …... …... 𝑅𝑇𝑆𝑡−1 𝑆𝑡 𝑆𝑇𝑆1 𝑆𝑇− 1
1. What’s the structure of 2. What are the features?3. How to select features?
![Page 12: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/12.jpg)
Regressor learning
𝑅𝑡𝑆0
𝑅1 …... …... 𝑅𝑇𝑆𝑡−1 𝑆𝑡 𝑆𝑇𝑆1 𝑆𝑇− 1
1. What’s the structure of 2. What are the features?3. How to select features?
![Page 13: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/13.jpg)
too weak slow convergence and poor generalization
Two level cascade
𝑅𝑡𝑆0
𝑅1 …... …... 𝑅𝑇
a simple regressor, e.g., a decision tree
𝑆𝑡−1 𝑆𝑡 𝑆𝑇
𝑆𝑡−1𝑟1 𝑟𝑘 𝑟 𝐾…… ..…. 𝑆𝑡
two level cascade: stronger rapid convergence
𝑆1 𝑆𝑇− 1
×
![Page 14: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/14.jpg)
105003.3
Trade-off between two levels
#stages in top level 5000#stages in bottom level 1
error () 5.2
with the fixed number (5,000) of regressor
510006.2
100504.5
![Page 15: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/15.jpg)
Regressor learning
𝑅𝑡𝑆0
𝑅1 …... …... 𝑅𝑇𝑆𝑡−1 𝑆𝑡 𝑆𝑇𝑆1 𝑆𝑇− 1
1. What’s the structure of 2. What are the features?3. How to select features?
![Page 16: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/16.jpg)
Pixel difference feature
Powerful on large training data
Extremely fast to compute– no need to warp image– just transform pixel coord. 𝐼 𝑙𝑒𝑓𝑡 𝑒𝑦𝑒≈ 𝐼 h𝑟𝑖𝑔 𝑡 𝑒𝑦𝑒
𝐼 h𝑚𝑜𝑢𝑡 ≫ 𝐼𝑛𝑜𝑠𝑒𝑡𝑖𝑝
[Ozuysal et. al. 2010], key point recognition[Dollar et. al. 2010], object pose estimation[Shotton et. al. 2011], body part recognition…
![Page 17: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/17.jpg)
How to index pixels?
• Global coordinate in (normalized) image• Sensitive to personal variations in face shape×
![Page 18: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/18.jpg)
Shape indexed pixels
• Relative to current shape • More robust to personal geometry variations√
![Page 19: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/19.jpg)
Tree based regressor • Node split function:
– select to maximize the variance reduction after split
𝐼 𝑥1
𝐼 𝑦1
𝐼 𝑥2
𝐼 𝑦2
𝐼 𝑥1− 𝐼 𝑦1>𝑡 1?
𝐼 𝑥2− 𝐼 𝑦2¿ 𝑡 2?
∆𝑆𝑙𝑒𝑎𝑓 =argmin∆ 𝑆
∑𝑖∈𝑙𝑒𝑎𝑓
¿ �̂�𝑖−(𝑆𝑖+∆𝑆)∨¿=¿∑
𝑖∈ 𝑙𝑒𝑎𝑓( �̂�𝑖−𝑆𝑖)
𝑙𝑒𝑎𝑓 𝑠𝑖𝑧𝑒 ¿¿ : ground truth: from last step
![Page 20: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/20.jpg)
Non-parametric shape constraint
𝑆𝑡=𝑆0+∑𝑤𝑖 �̂�𝑖• All shapes are in the linear space of all
training shapes if initial shape is• Unlike PCA, it is learned from data
– automatically– coarse-to-fine
∆𝑆𝑙𝑒𝑎𝑓 =argmin∆ 𝑆
∑𝑖∈𝑙𝑒𝑎𝑓
¿ �̂�𝑖−(𝑆𝑖+∆𝑆)∨¿=¿∑
𝑖∈ 𝑙𝑒𝑎𝑓(�̂�𝑖−𝑆𝑖)
𝑙𝑒𝑎𝑓 𝑠𝑖𝑧𝑒 ¿¿
![Page 21: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/21.jpg)
Learned coarse-to-fine constraint
stage2 4 6 8 10
10
20
30
#PCs
Stage 1PC
Apply PCA (keep variance) to all in each first level stageStage 10
![Page 22: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/22.jpg)
Regressor learning
𝑅𝑡𝑆0
𝑅1 …... …... 𝑅𝑇𝑆𝑡−1 𝑆𝑡 𝑆𝑇𝑆1 𝑆𝑇− 1
1. What’s the structure of 2. What are the features?3. How to select features?
![Page 23: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/23.jpg)
Challenges in feature selection
• Large feature pool: pixels → features– N = 400 → 160,000 features
• Random selection: pool accuracy
• Exhaustive selection: too slow
![Page 24: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/24.jpg)
Correlation based feature selection
• Discriminative feature is also highly correlated to the regression target– correlation computation is fast: time
• For each tree node (with samples in it)1. Project regression target to a random direction2. Select the feature with highest correlation to the
projection3. Select best threshold to minimize variation after split
![Page 25: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/25.jpg)
More Details
• Fast correlation computation– instead of , : number of pixels
• Training data augmentation– introduce sufficient variation in initial shapes
• Multiple initialization– merge multiple results: more robust
![Page 26: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/26.jpg)
Performance
#points 5 29 87Training (2000 images) 5 mins 10 mins 21 mins
Testing (per image) 0.32 ms 0.91 ms 2.9 ms
• Testing is extremely fast– pixel access and comparison– vector addition (SIMD)
≈300+ FPS
![Page 27: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/27.jpg)
Results on challenging web images
• Comparison to [Belhumeur et. al. 2011]– P. Belhumeur, D. Jacobs, D. Kriegman, and N. Kumar. Localizing parts
of faces using a concensus of exemplars. In CVPR, 2011.
– 29 points, LFPW dataset– 2000 training images from web– the same 300 testing images
• Comparison to [Liang et. al. 2008]– L. Liang, R. Xiao, F. Wen, and J. Sun. Face alignment via component-
based discriminative search. In ECCV, 2008.
– 87 points, LFW dataset– the same training (4002) and test (1716) images
![Page 28: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/28.jpg)
Compare with [Belhumeur et. al. 2011]
• Our method is 2,000+ times faster
1 3 24 756 89 1011 1213 16
15141719
18202221
23 25 2427262829
0 5 10 15 20 25 30
0 %
10%
20%
relative error reduction by our approach
point radius: mean error
better by better by
worse
![Page 29: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/29.jpg)
Results of 29 points
![Page 30: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/30.jpg)
Compare with [Liang et. al. 2008]• 87 points, many are texture-less• Shape constraint is more important
Mean error < 5 pixels < 7.5 pixels < 10 pixels
Method in [2] 74.7% 93.5% 97.8%
Our Method 86.1% 95.2% 98.2%
percentage of test images with
![Page 31: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/31.jpg)
Results of 87 points
![Page 32: Face Alignment by Explicit Shape Regression](https://reader036.vdocument.in/reader036/viewer/2022081511/5681674c550346895ddbfd46/html5/thumbnails/32.jpg)
Summary
Challenges:• Heuristic and fixed shape
model (e.g., PCA)
• Large variation in face appearance/geometry
• Large training data and feature space
Our techniques:• Non-parametric shape
constraint
• Cascaded regression and shape indexed features
• Correlation based feature selection