synergistic face detection and pose...
TRANSCRIPT
![Page 1: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/1.jpg)
Synergistic Face Detection Synergistic Face Detection
and Pose Estimationand Pose Estimation
M. Osadchy M.Miller Y. LeCunM. Osadchy M.Miller Y. LeCun
Technion NEC Labs NYUTechnion NEC Labs NYU
![Page 2: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/2.jpg)
Our SystemOur System
Detects faces independently of their poses. Detects faces independently of their poses.
Estimates head poses.Estimates head poses.
No
tracking!
![Page 3: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/3.jpg)
Our SystemOur System
Robust to: yaw (from left to right profile), Robust to: yaw (from left to right profile), roll (-45, 45), and pitch (-60, 60).roll (-45, 45), and pitch (-60, 60).
Single Detector is applied to all poses.Single Detector is applied to all poses.
Pose estimation: Within 15° error about 90% of Pose estimation: Within 15° error about 90% of poses are estimated correctly.poses are estimated correctly.
Near real-time: 5 frames per second on standard Near real-time: 5 frames per second on standard hardware.hardware.
![Page 4: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/4.jpg)
SynergySynergy
Multi-View
Face
Detection
Pose
estimation
closely related Inner class variation (skin Inner class variation (skin
color, hair style, etc.)color, hair style, etc.)
Lighting VariationsLighting Variations
Scale VariationsScale Variations
Facial ExpressionsFacial Expressions
……
Common Problems
Train together Better generalization
![Page 5: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/5.jpg)
Integrating Face Detection and Pose Integrating Face Detection and Pose
Estimation: Previous MethodsEstimation: Previous Methods
image
Rough pose
estimation
Pose specific face
detector
Unmanageable in real problems
![Page 6: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/6.jpg)
Integrating Face Detection and Pose Integrating Face Detection and Pose
Estimation: Our ApproachEstimation: Our ApproachLow dimensional space
Mapping: G
)(ZF
Face Manifold
parameterized by
pose
Apply
Image X
)(XG)()( ZFXG −
![Page 7: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/7.jpg)
Parameterization of the Face Parameterization of the Face
Manifold – Single ParameterManifold – Single Parameter
]2 ,2[ pipiZ −==θYaw:
03pi− 3pi pose2pi− 2pi2pi−
( ) )cos( iiF αθθ −= ]3 ,0 ,3[ pipi−=α3,2,1=i
Face manifold
1F
2F3F
∑
∑
=
==3
1
3
1
sin
cosarctan
iii
iii
G
G
α
αθθ
![Page 8: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/8.jpg)
Parameterization of the Face Parameterization of the Face
Manifold – Two ParametersManifold – Two Parameters
]2 ,2[ pipi−=θ
);cos()cos(),( jiijF βϕαθϕθ −−= ]3 ,0 ,3[, pipi−=βα3,2,1, =ji
a portion of the surface of a sphere
Yaw and roll :
] 4 ,4[ pipi−=ϕ ≅
( ) )cos()cos( jiij
ij XGcc βα∑=
( ) )cos()cos( jiij
ij XGsc βα∑=
( ) )sin()cos( jiij
ij XGcs βα∑=
( ) )sin()sin( jiij
ij XGss βα∑=
( ) ( )sscccsscssccsccs +−+−+= ,atan2,atan2(5.0θ
( ) ( )sscccsscssccsccs +−−−+= ,atan2,atan2(5.0ϕ
where
),( ϕθ=Z
![Page 9: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/9.jpg)
Minimum Energy Machine Minimum Energy Machine
( )XZ,Y,EWEnergy function:
]45,45[]90,90[}{ −×−=Z
imagepose label
parameters
=0
1Y face
non face
measures compatibility between X,Z,Y. ( )XZ,Y,EW
If X is a face with pose Z then we want: ( ) ( ) ;'Z ,X,Z'0,EXZ,1,E WW ∀< <
( ) ( ) ZZ ≠∀< <' ,X,Z'1,EXZ,1,E WW
![Page 10: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/10.jpg)
Operating the MachineOperating the Machine
Clamp X to the observed value (the image)
Find Z and Y such that:
Complete energy:
( ){ }
( )XZ,Y,EminargZ,Y WZZ{Y},Y ∈∈
=
( ) ( ) TY1)()(YXZ,Y,EW ⋅−+−⋅= ZFXGW
Y=1X is a face
![Page 11: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/11.jpg)
Operating the MachineOperating the Machine
Clamp X to the observed value (the image)
Find Z and Y such that:
Complete energy:
( ){ }
( )XZ,Y,EminargZ,Y WZZ{Y},Y ∈∈
=
( ) ( ) TY1)()(YXZ,Y,EW ⋅−+−⋅= ZFXGW
Y=0X is not a face
![Page 12: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/12.jpg)
ArchitectureArchitecture
X (image)
analytical
mapping onto
face manifold
convolutional
network
Z (pose)
)()(GW ZFX −
switch
( energy)
Y (label)
W
(param)
)(GW X )(ZF
T
),,(EW XZY
Operating the machine:
F(Z)(X)GZ W{Z}Z
−=∈minarg
=0
1Y
TF(Z)(X)GW <−otherwise
![Page 13: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/13.jpg)
Convolutional NetworkConvolutional Network
““end-to-end” trainable systems from low-level features to end-to-end” trainable systems from low-level features to
high-level representations.high-level representations.
Easily learn the type of shift-invariant features, relevant Easily learn the type of shift-invariant features, relevant
to object recognition.to object recognition.
Can be replicated over large images much more Can be replicated over large images much more
efficiently than traditional classifiers. efficiently than traditional classifiers.
Considerable advantage for
real-time systems!
![Page 14: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/14.jpg)
Convolutions Subsampling
ConvolutionsSubsampling
Convolutions
Full
connection
C1: feature maps
8@28x28
S1: f. maps
8@14x14
C3: f. maps
20@10x10S4: f. maps
20@5x5 C5: 120Output: 9
Input
32x32
Similar to LeNet5, with more maps:
![Page 15: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/15.jpg)
Training with Discriminative Loss Training with Discriminative Loss
FunctionFunction
( ) ( )∑∑∈∈
+=01
,1
,,1
)( 0
0
1
1 Si
i
Si
ii XWLS
XZWLS
WL
training faces training non-faces
loss for face sample
with known pose loss for non-face
sample
We showed that this loss function causes the machine to exhibit proper behavior:
( ) ( ) margin,...YE,...YE undesireddesired +<
Minimize:
21 ),,1(),,1,( XZEXZWL W= [ ]),,1(exp),0,(0 XZEKXWL −=
(X)GWF(Z) (X)GW
)ZF(
![Page 16: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/16.jpg)
Running the MachineRunning the Machine
Works on grey-level images.Works on grey-level images.
Applied at range of scales stepping by a factor Applied at range of scales stepping by a factor
of .of .
The network is replicated over the image at each The network is replicated over the image at each
scale, stepping by 4 pixels in x and y.scale, stepping by 4 pixels in x and y.
Overlapping detections are replaced by the Overlapping detections are replaced by the
strongest.strongest.
2
![Page 17: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/17.jpg)
ResultsResults
Our system is robust to yaw , in-plane rotation , Our system is robust to yaw , in-plane rotation ,
and pitchand pitch
90±54± 60±
![Page 18: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/18.jpg)
TrainingTraining
52,850, 32x32 grey-level images of faces (NEC Labs hand 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with uniform distribution of poses.annotated set) with uniform distribution of poses.
Initial negative set: 52,850 random non-face natural images.Initial negative set: 52,850 random non-face natural images. Second phase: half of the initial negative set was replaced by Second phase: half of the initial negative set was replaced by
false positives of the initial version of the detector.false positives of the initial version of the detector. Each training image was used 5 times with random variation Each training image was used 5 times with random variation
in scale, in-plane rotation, brightness and contrast.in scale, in-plane rotation, brightness and contrast. 9 passes on the data: 26 hours on 2Ghz Pentium 4.9 passes on the data: 26 hours on 2Ghz Pentium 4. The system converged to an EER of 5% on training set and The system converged to an EER of 5% on training set and
6% on test set of 90,000 images.6% on test set of 90,000 images.
![Page 19: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/19.jpg)
Test on Standard Data SetsTest on Standard Data Sets No standard set tests all poses, that our system is No standard set tests all poses, that our system is
designed to detect.designed to detect. 3 standard sets focusing on particular pose variation: 3 standard sets focusing on particular pose variation:
tilted, profile, and frontal.tilted, profile, and frontal.
x93%86%Schneiderman & Kanade
x96%89%Rowley et al
x83%70%xJones & Viola (profile)
xx95%90%Jones & Viola (tilted)
88%83%83%67%97%90%Our Detector
1.280.53.360.4726.94.42
MIT+CMUPROFILETILTEDData Set->False positives per image->
Rea
l tim
e
![Page 20: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/20.jpg)
DetectionPose Estimation of
the detected faces
Note: typical pose estimation systems input centered faces; when we hand
localize this faces we get: 89% of yaw and 100% of in-plane rotations
within 15 degrees.
Standard Sets
![Page 21: Synergistic Face Detection and Pose Estimationyann/research/cface/osadchy-face-detect-sicily.pdfTraining 52,850, 32x32 grey-level images of faces (NEC Labs hand annotated set) with](https://reader030.vdocument.in/reader030/viewer/2022040907/5e7d53b85d77ee5806412197/html5/thumbnails/21.jpg)
Synergy TestSynergy Test
Detection Pose Estimation