distance metric learning for set-based visual recognition
TRANSCRIPT
Distance Metric Learning for Set-based Visual Recognition
Ruiping Wang
Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS)
Oct. 8, 2016
ECCV2016 Tutorial on โDistance Metric Learning for Computer Visionโ Part-3
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
2
Outline
Background
Literature review
Evaluations
Summary
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
Identification Typical applications
Photo matching (1:N)
Watch list screening (1:N+1)
Performance metric FR(@FAR)
Verification Typical applications
Access control (1:1)
E-passport (1:1)
Performance metric ROC: FRR+FAR
3
Face Recognition with Single Image
Who is this celebrity? Are they the same guy?
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
4
Challenges
Pose Pose
Illumination
Pose
Illumination
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
5
Challenges
Pose
Illumination
Personal ID
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
6
Challenges
Pose
Illumination
Personal ID
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
7
Challenges
Pose
Personal ID
Illumination
Probe
Gallery
Same Person
Different Person
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
8 8
Challenges
Intra-class variation vs. inter-class variation Distance measure semantic meaning
Sample-based metric learning is made even harder
Pose variation Illumination variation
D(x, y)=?
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
9 9
Face Recognition with Videos
Video surveillance
http://www.youtube.com/watch?v=M80DXI932OE
http://www.youtube.com/watch?v=RfJsGeq0xRA#t=22
Seeking missing children Searching criminal suspects
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
10 10
Face Recognition with Videos
Video shot retrieval
1 3 2 4 5 7 6 9 Next Page 8
S01E01: 10โ48โโ S01E06: 05โ22โโ S01E02: 04โ20โโ S01E02: 00โ21โโ
S01E03: 08โ06โโ S01E05: 09โ23โโ S01E01: 22โ20โโ S01E04: 03โ42โโ
Smart TV-Series Character
Shots Retrieval System
โthe Big Bang Theoryโ
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
11
Treating Video as Image Set
A new & different problem
Unconstrained acquisition conditions
Complex appearance variations
Two phases: set modeling + set matching
New paradigm:
set-based metric learning
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
12
Outline
Background
Literature review
Evaluations
Summary
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
13
Overview of previous works
From the
view of set
modeling
Statistics [Shakhnarovich, ECCVโ02]
[Arandjeloviฤ, CVPRโ05]
[Wang, CVPRโ12]
[Harandi, ECCVโ14/ICCVโ15]
[Wang, CVPRโ15]
G1
G2
K-L
Divergence
S1
S2
Linear subspace [Yamaguchi, FGโ98]
[Kim, PAMIโ07]
[Hamm, ICMLโ08]
[Harandi, CVPRโ11]
[Huang, CVPRโ15]
Principal
angle
H1
H2
Affine/Convex hull [Cevikalp, CVPRโ10]
[Hu, CVPRโ11]
[Yang, FGโ13]
[Zhu, ICCVโ13]
[Wang, ACCVโ16]
NN
matching
Nonlinear manifold [Kim, BMVCโ05]
[Wang, CVPRโ08]
[Wang, CVPRโ09]
[Chen, CVPRโ13]
[Lu, CVPRโ15]
Principal
angle(+)
M1
M2
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
14
Overview of previous works
Set modeling Linear subspaceNonlinear manifold
Affine/Convex Hull (affine subspace)
Parametric PDFs Statistics
Set matchingโbasic distance Principal angles-based measure
Nearest neighbor (NN) matching approach
K-L divergenceSPD Riemannian metricโฆ
Set matchingโmetric learning Learning in Euclidean space
Learning on Riemannian manifold
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
15
Set model I: linear subspace
Properties PCA on the set of image samples to get subspace
Loose characterization of the set distribution region
Principal angles-based measure discards the
varying importance of different variance directions
Methods MSM [FGโ98]
DCC [PAMIโ07]
GDA [ICMLโ08]
GGDA [CVPRโ11]
PML [CVPRโ15]
โฆ
S1
S2
No distribution boundary
No direction difference
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
16
Set model I: linear subspace
MSM (Mutual Subspace Method) [FGโ98]
Pioneering work on image set classification
First exploit principal angles as subspace distance
Metric learning: N/A
[1] O. Yamaguchi, K. Fukui, and K. Maeda. Face Recognition Using Temporal Image
Sequence. IEEE FG 1998.
subspace method Mutual subspace method
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
17
Set model I: linear subspace
DCC (Discriminant Canonical Correlations) [PAMIโ07]
Metric learning: in Euclidean space
[1] T. Kim, J. Kittler, and R. Cipolla. Discriminative Learning and Recognition of Image
Set Classes Using Canonical Correlations. IEEE T-PAMI, 2007.
Set 1: ๐ฟ๐ Set 2: ๐ฟ๐
๐ท๐
๐ท๐
Linear subspace by:
orthonormal basis matrix
๐ฟ๐๐ฟ๐๐ โ ๐ท๐๐ฒ๐๐ท๐
๐
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
18
DCC
Canonical Correlations/Principal Angles Canonical vectorscommon variation modes
Canonical vectors
Set 1: ๐ฟ๐
Set 2: ๐ฟ๐
๐ท1๐๐ท2 = ๐ธ12๐ฆ๐ธ21
๐ ๐ผ = ๐ท1๐ธ12 = [๐1, โฆ , ๐2] ๐ฝ = ๐ท2๐ธ21 = [๐1, โฆ , ๐2]
๐ฆ = ๐๐๐๐(cos ๐1, โฆ , cos ๐๐)
Canonical Correlation: cos ๐๐
Principal Angles: ๐๐
๐ท๐
๐ท๐
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
19
DCC
Discriminative learning
Linear transformation
๐ป: ๐ฟ๐ โถ ๐๐ = ๐ป๐๐ฟ๐
Representation
๐๐๐๐๐ = (๐ป๐๐ฟ๐)(๐ป
๐๐ฟ๐)๐
โ (๐ป๐๐ท๐)๐ฒ๐(๐ป๐๐ท๐)
๐
Set similarity
๐น๐๐ = max๐ธ๐๐,๐ธ๐๐
๐ก๐(๐๐๐)
๐๐๐ = ๐ธ๐๐๐ ๐ทโฒ๐
๐๐ป๐ป๐๐ท๐๐๐ธ๐๐
Discriminant function
๐ป = max๐๐๐ ๐ป
๐ก๐ ๐ป๐๐บ๐๐ป /๐ก๐(๐ป๐๐บ๐ค๐ป)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
20
Set model I: linear subspace
GDA [ICMLโ08] / GGDA [CVPRโ11]
Treat subspaces as points on Grassmann manifold
Metric learning: on Riemannian manifold
[1] J. Hamm and D. D. Lee. Grassmann Discriminant Analysis: a Unifying View on Subspace-Based
Learning. ICML 2008.
[2] M. Harandi, C. Sanderson, S. Shirazi, B. Lovell. Graph Embedding Discriminant Analysis on
Grassmannian Manifolds for Improved Image Set Matching. IEEE CVPR 2011.
Subspace Subspace
Grassmann manifold
implicit kernel
mapping
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
21
GDA/GGDA
Projection metric
๐๐ ๐1, ๐2 = sin2 ๐๐๐1/2 = 2โ1/2 ๐1๐1
๐ โ ๐2๐2๐
๐น
Geodesic distance: (Wong,
1967; Edelman et al., 1999)
๐๐บ2(๐1, ๐2) = ๐๐
2
๐
๐๐: Principal angles
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
22
GDA/GGDA
Projection kernel Projection embedding (isometric)
ฮจ๐: ๐ข ๐, ๐ท โถ โ๐ทร๐ท , ๐ ๐๐๐ ๐ โถ ๐๐๐
The inner-product of โ๐ทร๐ท
๐ก๐((๐1๐1๐)(๐2๐2
๐)) = ๐1๐๐2 ๐น
2
Grassmann kernel (positive definite kernel)
๐๐ ๐1, ๐2 = ๐1๐๐2 ๐น
2
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
23
GDA/GGDA
Discriminative learning Classical kernel methods using the Grassmann kernel
e.g., Kernel LDA / kernel Graph embedding
๐ถโ = arg max๐ถ
๐ถ๐ป๐ฒ๐พ๐ฒ๐ถ
๐ถ๐ป๐ฒ๐ฒ๐ถ
Grassmann kernel
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
24
Set model I: linear subspace
PML (Projection Metric Learning) [CVPRโ15]
Metric learning: on Riemannian manifold
[1] Z. Huang, R. Wang, S. Shan, X. Chen. Projection Metric Learning on Grassmann Manifold with
Application to Video based Face Recognition. IEEE CVPR 2015.
(b) (c) (a)
๐1
๐๐
๐๐
๐๐ ๐๐
๐1
๐ข(๐, ๐ท)
๐1
๐๐ ๐๐
๐ข(๐, ๐)
(d)
๐1 ๐๐
๐๐
โ (e)
๐๐
๐1
๐๐
โ๐
Learning with kernel in RKHS
Learning directly on the manifold
Structure distortion
No explicit mapping
High complexity
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
25
PML
Explicit manifold to manifold mapping ๐ ๐ = ๐พ๐๐ โ ๐ข ๐, ๐ , ๐ โ ๐ข ๐, ๐ท , ๐ โค ๐ท
Projection metric on target Grassmann manifold ๐ข(๐,๐)
๐๐2 ๐ ๐i , ๐ ๐j = 2โ1/2 (๐พ๐๐๐
โฒ) ๐พ๐๐๐โฒ ๐ โ (๐พ๐๐๐
โฒ)(๐พ๐๐๐โฒ)๐
๐น
2=
2โ1/2๐ก๐ ๐ท๐๐จ๐๐๐จ๐๐๐
๐จ๐๐ = (๐๐โฒ๐๐
โฒ๐ โ ๐๐โฒ๐๐
โฒ)๐ , ๐ท = ๐๐๐ is a rank-๐ symmetric positive
semidefinite (PSD) matrix of size ๐ท ร ๐ท (similar form as
Mahalanobis matrix )
๐i needs to be normalized to ๐๐โฒ so that the columns of ๐พ๐๐i are
orthonormal
(b) (c)
๐๐ ๐๐
๐1
๐(๐1)
๐(๐๐) ๐(๐๐)
๐ข(๐, ๐ท) ๐ข(๐, ๐)
๐
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
26
PML
Discriminative learning Discriminant function
Minimize/Maximize the projection distances of any within-
class/between-class subspace pairs
๐ฝ = min ๐ก๐ ๐ท๐๐จ๐๐๐จ๐๐๐ท๐๐=๐๐โ ๐ ๐ก๐ ๐ท๐๐จ๐๐๐จ๐๐๐ท๐๐โ ๐๐
Optimization algorithm Iterative solution for one of ๐โฒ and ๐ท by fixing the other
Normalization of ๐ by QR-decomposition
Computation of ๐ท by Riemannian Conjugate Gradient (RCG)
algorithm on the manifold of PSD matrices
within-class between-class
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
27
Set model II: nonlinear manifold
Properties Capture nonlinear complex appearance variation
Need dense sampling and large amount of data
Less appealing computational efficiency
Methods MMD [CVPRโ08]
MDA [CVPRโ09]
BoMPA [BMVCโ05]
SANS [CVPRโ13]
MMDML [CVPRโ15]
โฆ
Complex distribution
Large amount of data
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
28
Set model II: nonlinear manifold
MMD (Manifold-Manifold Distance) [CVPRโ08]
Set modeling with nonlinear appearance manifold
Image set classification distance computation
between two manifolds
Metric learning: N/A
D(M1,M2)
[1] R. Wang, S. Shan, X. Chen, W. Gao. Manifold-Manifold Distance with Application to Face
Recognition based on Image Set. IEEE CVPR 2008. (Best Student Poster Award Runner-up)
[2] R. Wang, S. Shan, X. Chen, Q. Dai, W. Gao. Manifold-Manifold Distance and Its Application to
Face Recognition with Image Sets. IEEE Trans. Image Processing, 2012.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
29
MMD
Multi-level MMD framework
Three pattern levels: Point->Subspace->
Manifold
S
x'
x S1
S2
x1
x2
PPD PSD SSD
x M
x''
x S
M
M1
M2
PMD SMD MMD
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
30
MMD
Formulation & Solution
SSD between pairs of local models
M1
M2
local linear models
iCjC
1C 2C3C
1C 2C 3C
Three modules
Local model construction
Local model distance
Global integration of local
distances
1 2 1 1( , ) ( , )
m n
ij i ji jd f d C C
M M
1 1. . 1, 0
m n
ij iji js t f f
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
31
MMD
Subspace-subspace
distance (SSD)
S1
S2
0
1e
1u
2e
2u
1v
2v
mean (cosine correlation)
+
variance (canonical correlation)
2 1/ 2
1 2 0 0( , ) (1 cos ) sinEd S S
2 1/ 2 2 1/ 2
1 2
1 1
( , ) ( sin ) ( cos )r r
P k k
k k
d S S r
2 2 1/ 2
1 2 0
1
2 2 1/ 2
0
1
1( , ) (sin sin )
1 (2 cos cos )
r
k
kr
k
k
d S Sr
r
same
Diff.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
32
MDA (Manifold Discriminant Analysis ) [CVPRโ09]
Goal: maximize โmanifold marginโ under Graph
Embedding framework
Metric learning: in Euclidean space
Euclidean distance between pair of image samples
[1] R. Wang, X. Chen. Manifold Discriminant Analysis. IEEE CVPR 2009.
M1
M2
within-class compactness
between-class separability
M1
M2
Intrinsic graph Penalty graph
Set model II: nonlinear manifold
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
33
Optimization framework Min. within-class scatter
Max. between-class scatter
Objective function Global linear transformation
Solved by generalized eigen-decomposition
2
, 2 ( )T T T T
w m n m n
m,n
S w X D W X v x v x v v
2
, 2 ( )T T T T
b m n m n
m,n
S w X D W X v x v x v v
Intrinsic
graph
Penalty
graph
MDA
( )Maximize ( ) =
( )
T Tb
T T
w
S X D W X J
S X D W X
v vv
v v
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
34
BoMPA (Boosted Manifold Principal Angles) [BMVCโ05]
Goal: optimal fusion of different principal angles
Exploit Adaboost to learn weights for each angle
[1] T. Kim O. Arandjelovic, R. Cipolla. Learning over Sets using Boosted Manifold Principal Angles
(BoMPA). BMVC 2005.
PAsimilar (common illumination) BoMPAdissimilar
Subject A Subject B
Set model II: nonlinear manifold
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
35
SANS (Sparse Approximated Nearest Subspaces) [CVPRโ13]
Goal: adaptively construct the nearest subspace pair
Metric learning: N/A
[1] S. Chen, C. Sanderson, M.T. Harandi, B.C. Lovell. Improved Image Set Classification via Joint
Sparse Approximated Nearest Subspaces. IEEE CVPR 2013.
Set model II: nonlinear manifold
SSD (Subspace-Subspace Distance):
Joint sparse representation (JSR) is
applied to approximate the nearest
subspace over a Grassmann manifold.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
36
MMDML (Multi-Manifold Deep Metric Learning) [CVPRโ15]
Goal: maximize โmanifold marginโ under Deep Learning
framework
Metric learning: in Euclidean space
Set model II: nonlinear manifold
[1] J. Lu, G. Wang, W. Deng, P. Moulin, and J. Zhou. Multi-Manifold Deep Metric Learning for Image
Set Classification. IEEE CVPR 2015.
Class-specific DNN
Model nonlinearity
Objective function
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
37
Set model III: affine subspace
Properties Linear reconstruction using: mean + subspace basis
Synthesized virtual NN-pair matching
Less characterization of global data structure
Computationally expensive by NN-based matching
Methods AHISD/CHISD [CVPRโ10]
SANP [CVPRโ11]
RNP [FGโ13]
PSDML/SSDML [ICCVโ13]
PDL [ACCVโ16]
โฆ
H1
H2
Sensitive to noise samples
High computation cost
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
38
Set model III: affine subspace
AHISD/CHISD [CVPRโ10] NN-based matching using sample Euclidean distance
Metric learning: N/A
Subspace spanned by all the available samples
๐ซ = {๐1, โฆ , ๐๐} in the set Affine hull
๐ป ๐ซ = ๐ซ๐ถ = ๐๐๐ผ๐| ๐ผ๐ = 1
Convex hull
๐ป ๐ซ = {๐ซ๐ถ = ๐๐๐ผ๐| ๐ผ๐ = 1, 0 โค ๐ผ๐ โค 1}
[1] H. Cevikalp, B. Triggs. Face Recognition Based on Image Sets. IEEE CVPR 2010.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
Set model III: affine subspace
SANP [CVPRโ11] / RNP [FGโ13]
Improve AHISD/CHISD by imposing different regularizations
on the linear representation coefficients
Metric learning: N/A
39
[2] Y. Hu, A.S. Mian, R. Owens. Sparse Approximated Nearest Points for Image Set Classification.
IEEE CVPR 2011.
[3] M. Yang, P. Zhu, L. Gool, and L. Zhang. Face Recognition based on Regularized Nearest Points
between Image Sets. IEEE FG 2013.
SANP, L1-norm regularization RNP, L2-norm regularization
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
40
Set model III: affine subspace
PSDML/SSDML [ICCVโ13] Metric learning: in Euclidean space
[1] P. Zhu, L. Zhang, W. Zuo, and D. Zhang. From Point to Set: Extend the Learning of
Distance Metrics. IEEE ICCV 2013.
point-to-set distance
metric learning (PSDML)
set-to-set distance metric
learning (SSDML)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
PSDML/SSDML
Point-to-set distance Basic distance
๐2 ๐, ๐ซ = ๐๐๐๐ถ ๐ โ ๐ป(๐ซ) 22 = ๐๐๐๐ถ ๐ โ ๐ซ๐ถ 2
2
Solution: Least Square Regression or Ridge Regression
Mahalanobis distance
๐๐ด2 ๐, ๐ซ = ๐๐๐ ๐ท(๐ โ ๐ซ๐ถ) 2
2 =๐ โ ๐ซ๐ถ ๐๐ท๐๐ท ๐ฅ โ ๐ซ๐ถ = ๐ โ ๐ซ๐ถ ๐๐ด ๐ โ ๐ซ๐ถ
Projection matrix
41
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
PSDML/SSDML
Point-to-set distance metric learning (PSDML) SVM-based method
๐๐ด2 ๐, ๐ซ = min ๐ท(๐ โ ๐ซ๐ถ) 2
2
= ๐ โ ๐ซ๐ถ ๐๐ท๐๐ท ๐ฅ โ ๐ซ๐ถ= ๐ โ ๐ซ๐ถ ๐๐ด ๐ โ ๐ซ๐ถ
min๐ด,๐ถ๐ ๐๐
,๐ถ๐, ๐๐๐๐,๐๐
๐,๐๐ด ๐น
2 + ๐( ๐๐,๐๐
๐,๐
+ ๐๐๐
๐
)
๐ . ๐ก. ๐๐ ๐๐ , ๐ซ๐ + ๐ โฅ 1 โ ๐๐๐๐ , ๐ โ ๐ ๐๐ ; (-)
๐๐ ๐๐ , ๐ซ๐ ๐๐+ ๐ โค โ1 + ๐๐
๐; (+)
๐ด โฝ 0, โ๐, ๐, ๐๐๐๐ โฅ 0, ๐๐
๐ โฅ 0
42
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
PSDML/SSDML
Set-to-set distance Basic distance
๐2 ๐ซ1, ๐ซ2 = ๐๐๐๐ถ1,๐ถ2๐ป(๐ซ1) โ ๐ป(๐ซ1) 2
2 =
๐๐๐๐ถ1,๐ถ2๐ซ1๐ถ1 โ ๐ซ2๐ถ2 2
2
Solution: AHISD/CHISD [Cevikalp, CVPRโ10]
Mahalanobis distance
๐๐ด2 ๐ซ1, ๐ซ2 = ๐๐๐ ๐ท(๐ซ1๐ถ1 โ ๐ซ2๐ถ2) 2
2 =๐ซ1๐ถ1 โ ๐ซ2๐ถ2
๐๐ด ๐ซ1๐ถ1 โ ๐ซ2๐ถ2
43
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
PSDML/SSDML
Set-to-set distance metric learning (SSDML) SVM-based method
๐๐ด2 ๐ซ1, ๐ซ2 = min ๐ท(๐ซ1๐ถ1 โ ๐ซ2๐ถ2) 2
2
= ๐ซ1๐ถ1 โ ๐ซ2๐ถ2๐๐ด ๐ซ1๐ถ1 โ ๐ซ2๐ถ2
min๐ด,๐ถ๐,๐ถ๐, ๐ถ๐, ๐๐๐
๐ ,๐๐๐๐ ,๐
๐ด ๐น2 + ๐( ๐๐๐
๐
๐,๐
+ ๐๐๐๐
๐,๐
)
๐ . ๐ก. ๐๐ ๐ซ๐ , ๐ซ๐ + ๐ โฅ 1 โ ๐๐๐๐ , ๐(๐ซ๐) โ ๐ ๐ซ๐ ; (-)
๐๐ ๐ซ๐ , ๐ซ๐ + ๐ โค โ1 + ๐๐๐๐ , ๐ ๐ซ๐ = ๐ ๐ซ๐ ; (+)
๐ด โฝ 0, โ๐, ๐, ๐, ๐๐๐
๐ โฅ 0, ๐๐๐๐ โฅ 0
44
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
45
Set model III: affine subspace
PDL (Prototype Discriminative Learning) [ACCVโ16]
Goal: jointly learn prototypes and linear transformation In the target subspace, for any image in each set, its NN prototype
from the same class is closer than that from different classes
Metric learning: in Euclidean space
[1] W. Wang, R. Wang, S. Shan, X. Chen. Prototype Discriminative Learning for Face Image Set
Classification. ACCV 2016.
โ๐
๐ = ๐พ๐ป๐
๐๐๐(๐)
๐ป1
๐ป2
๐ป3
๐๐๐(๐)
๐ป1
๐ป2
๐ป3 โ๐
Training process of one sample The target subspace and prototypes
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
PDL
Prototype Representation Original image sets
๐๐ = ๐ฅ๐,1, โฆ , ๐ฅ๐,๐๐โ โ๐ร๐๐ , ๐ = 1, โฆ , ๐ถ
Affine hull representation [Cevikalp, CVPRโ10]
The smallest affine subspace containing affine
combinations of sample images in an image set.
๐ป๐ = ๐ฅ = ๐ผ๐,๐ โ ๐ฅ๐,๐
๐๐
๐=1
๐ผ๐,๐ = 1
๐๐
๐=1
, ๐ = 1, โฆ , ๐ถ
Assumption: Prototypes ๐๐ = ๐๐,1, โฆ , ๐๐,๐๐โ ๐ป๐
๐๐,๐ = ๐ + ๐๐๐ฃ๐,๐ , ๐ฃ๐,๐ โ โ๐
46
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
PDL
Objective function
47
๐คโ, ๐1โ, โฆ , ๐๐ถ
โ = argmin๐,๐1,โฆ,๐๐ถ
๐ฝ(๐, ๐1, โฆ , ๐๐ถ)
๐ฝ ๐, ๐1, โฆ , ๐๐ถ =
๐ฅโ๐๐
๐ถ
๐=1
๐๐ฅ =๐ ๐ฆ, ๐๐๐ค
๐ ๐๐๐ฅ
๐ ๐ฆ, ๐๐๐๐ ๐๐๐ฅ
๐๐ฝ
A smooth approximation of the step function:
๐บ๐ท ๐ =๐
๐+๐๐ท ๐โ๐
๐๐ฅ
Nearest neighbor prototype
from the same class Nearest neighbor prototype
from different classes
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
48
Set model IV: statistics (COV+)
Properties The natural raw statistics of a sample set
Flexible model of multiple-order statistical information
Methods CDL [CVPRโ12]
LMKML [ICCVโ13]
DARG [CVPRโ15]
B. Gauss [ICCVโ15]
SPD-ML [ECCVโ14]
LEML [ICMLโ15]
LERM [CVPRโ14]
HER [CVPRโ15 ]
โฆ
G1
G2
Natural raw statistics
More flexible model
[Shakhnarovich, ECCVโ02]
[Arandjeloviฤ, CVPRโ05]
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
49
CDL (Covariance Discriminative Learning) [CVPRโ12]
Set modeling by Covariance Matrix (COV) The 2nd order statistics characterizing set data variations
Robust to noisy set data, scalable to varying set size
Metric learning: on the SPD manifold
[1] R. Wang, H. Guo, L.S. Davis, Q. Dai. Covariance Discriminative Learning: A Natural
and Efficient Approach to Image Set Classification. IEEE CVPR 2012.
TI
log
Si
Ci Cj
Sj
I
Model data variations
No assum. of data distr.
Set model IV: statistics (COV+)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
50
CDL
Set modeling by Covariance Matrix
1 2= [ , ,..., ]N d NX x x x
COV: d*d symmetric positive
definite (SPD) matrix*
1
1( )( )
1
NT
i i
iN
C x x x x
Image set: N samples with d-
dimension image feature
*: use regularization to tackle singularity
problem
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
51
CDL
Set matching on COV manifold Riemannian metrics on the SPD manifold
Affine-invariant distance (AID) [1]
or
Log-Euclidean distance (LED) [2]
1 2 1 2( , ) log ( ) log ( )F
d I I
C C C C
2 2
1 2 1 21( , ) ln ( , )
d
iid
C C C C
22 -1/2 -1/2
1 2 1 2 1( , ) log ( )F
d I
C C C C C
High
computational
burden
More efficient,
more appealing
[1] W. Fรถrstner and B. Moonen. A Metric for Covariance Matrices. Technical Report 1999.
[2] V. Arsigny, P. Fillard, X. Pennec and N. Ayache. Geometric Means In A Novel Vector Space
Structure On Symmetric Positive-Definite Matrices. SIAM J. MATRIX ANAL. APPL. Vol. 29, No. 1, pp.
328-347, 2007.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
52
CDL
Set matching on COV manifold (cont.) Explicit Riemannian kernel feature mapping with LED
1 2 1 2( , ) [log ( ) log ( )]logk trace I IC C C C
Mercerโs
theorem
Tangent space at
Identity matrix I
Riemannian manifold
of non-singular COV
M
TIM
X1 Y1
0
X2
X3
Y2
Y3
x1
x2 x3
y1
y2
y3
logI logI
: log ( ), ( )d d
log R I
C C
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
53
CDL
Discriminative learning on COV manifold Partial Least Squares (PLS ) regression
Goal: Maximize the covariance between
observations and class labels
Space X Space Y
Linear Projection
Latent Space T
feature vectors,
e.g. log-COV
matrices
class label
vectors, e.g. 1/0
indicators
X = T * Pโ
Y = T * B * Cโ = X * Bpls
T is the common latent
representation
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
COVSPD manifold Model
Metric
Kernel
SubspaceGrassmannian Model
Metric
Kernel
54
CDL vs. GDA
u1
u2
1
1( )( )
1
NT
i i
iN
C x x x x
*
TC U U
1 2[ , , , ]m D mU u u u
1/ 2
1 2 1 1 2 2( , ) 2 T T
proj Fd U U U U U U
: log ( ), ( )d d
log R I
C C : , ( , )T d d
proj m D R U UU
1 2 1 2( , ) log ( ) log ( )F
d I I
C C C C
log ( ) log ( ) TI I
C U U
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
55
LMKML (Localized Multi-Kernel Metric Learning) [ICCVโ13]
Exploring multiple order statistics Data-adaptive weights for different types of features
Ignoring the geometric structure of 2nd/3rd-order statistics
Metric learning: in Euclidean space
[1] J. Lu, G. Wang, and P. Moulin. Image Set Classification Using Holistic Multiple Order Statistics
Features and Localized Multi-Kernel Metric Learning. IEEE ICCV 2013.
Set model IV: statistics (COV+)
Complementary information
(mean vs. covariance)
1st / 2nd / 3rd-order statistics
Objective function
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
56
DARG (Discriminant Analysis on Riemannian manifold of
Gaussian distributions) [CVPRโ15]
Set modeling by mixture of Gaussian distribution (GMM)
Naturally encode the 1st order and 2nd order statistics
Metric learning: on Riemannian manifold
[1] W. Wang, R. Wang, Z. Huang, S. Shan, X. Chen. Discriminant Analysis on Riemannian Manifold
of Gaussian Distributions for Face Recognition with Image Sets. IEEE CVPR 2015.
๐๏ผRiemannian
manifold of Gaussian
distributions
Non-discriminative
Time-consuming
ร
Set model IV: statistics (COV+)
[Shakhnarovich, ECCVโ02]
[Arandjeloviฤ, CVPRโ05]
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
57
DARG
Framework
(a)
(b)
Discriminative
learning (c)
โณ โ๐ Kernel
embedding โ
โณ: Riemannian manifold of Gaussian distributions
โ: high-dimensional reproducing kernel Hilbert space (RKHS)
โ๐: target lower-dimensional discriminant Euclidean subspace
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
58
DARG
Kernels on the Gaussian distribution manifold
kernel based on Lie Group Distance based on Lie Group (LGD)
๐ฟ๐บ๐ท ๐๐ , ๐๐ = log ๐๐ โ log ๐๐ ๐น
,
๐~๐(๐ฅ|๐, ฮฃ) โผ ๐ = ฮฃ โ 1
๐+1ฮฃ + ๐๐๐ ๐
๐๐ 1
Kernel function
๐พLGD ๐๐ , ๐๐ = exp โ๐ฟ๐บ๐ท2 ๐๐ , ๐๐
2๐ก2
SPD matrix according
to information geometry
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
59
DARG
Kernels on the Gaussian distribution manifold
kernel based on Lie Group
kernel based on MD and LED Mahalanobis Distance (MD) between mean
๐๐ท ๐๐ , ๐๐ = ๐๐ โ ๐๐๐
ฮฃ๐โ1 + ฮฃ๐
โ1 ๐๐ โ ๐๐
LED between covariance matrix ๐ฟ๐ธ๐ท ฮฃ๐ , ฮฃ๐ = log ฮฃ๐ โ log ฮฃ๐ ๐น
Kernel function ๐พ๐๐ท+๐ฟ๐ธ๐ท ๐๐ , ๐๐ = ๐พ1๐พ๐๐ท ๐๐ , ๐๐ + ๐พ2๐พ๐ฟ๐ธ๐ท ฮฃ๐ , ฮฃ๐
๐พ๐๐ท ๐๐ , ๐๐ = exp โMD2 ๐๐ , ๐๐
2๐ก2
๐พ๐ฟ๐ธ๐ท ฮฃ๐ , ฮฃ๐ = exp โ๐ฟ๐ธ๐ท2 ฮฃ๐ , ฮฃ๐
2๐ก2
๐พ1, ๐พ2 are the combination coefficients
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
60
DARG
Discriminative learning
Weighted KDA (kernel discriminant analysis) incorporating the weights of Gaussian components
๐ฝ ๐ผ =๐ผ๐๐ฉ๐ผ
๐ผ๐๐พ๐ผ
๐พ = 1
๐๐ ๐ค๐
(๐)๐๐
๐ โ ๐๐ ๐๐๐ โ ๐๐
๐๐๐
๐=1
๐ถ
๐=1
๐ฉ = ๐๐ ๐๐ โ ๐ ๐๐ โ ๐ ๐
๐ถ
๐=1
๐๐ =1
๐๐๐๐ ๐ค๐
๐๐๐๐
๐๐
๐=1
, m =1
๐๐
1
๐๐ ๐ค๐
๐๐๐๐
๐๐
๐=1
๐ถ
๐=1
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
61
Beyond Gauss [ICCVโ15]
Set modeling by probability distribution functions (PDFs)
More general than Gaussian assumption
non-parametric, data-driven kernel density estimator (KDE)
Metric learning: on Riemannian manifold
[1] M. Harandi, M. Salzmann, and M. Baktashmotlagh. Beyond Gauss: Image-Set Matching on the
Riemannian Manifold of PDFs. IEEE ICCV 2015.
Set model IV: statistics (COV+)
PDFs form a Riemannian manifold,
i.e., the statistical manifold.
Csiszรกr f-divergences are exploited
to measure the geodesic distance.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
Beyond Gauss
Set modeling with PDFs Kernel Density Estimation (KDE)
๐ ๐ฅ =1
๐ det 2๐ฮฃ exp โ
1
2๐ฅ โ ๐ฅ๐
๐ฮฃโ1 ๐ฅ โ ๐ฅ๐
๐
๐=1
Given two image sets ๐ฅ๐ ๐
๐=1
๐๐ and ๐ฅ๐
๐
๐=1
๐๐ with estimated
PDFs ๐ ๐ฅ and ๐ ๐ฅ , how to compare two PDFs ๐ ๐ฅ and
๐ ๐ฅ ?
Empirical estimation of f-Divergences Hellinger distance
๐ฟ ๐ป2(๐| ๐ =
1
๐๐ ๐ ๐ฅ๐
๐โ 1 โ ๐ ๐ฅ๐
๐
2๐๐
๐
+1
๐๐ ๐ ๐ฅ๐
๐โ 1 โ ๐ ๐ฅ๐
๐
2๐๐
๐
Jeffrey divergence
๐ฟ ๐ป2(๐| ๐ =
1
๐๐ 2๐ ๐ฅ๐
๐โ 1 ln
๐ ๐ฅ๐๐
1 โ ๐ ๐ฅ๐๐
๐๐
๐
+1
๐๐ 2๐ ๐ฅ๐
๐โ 1 ln
๐ ๐ฅ๐๐
1 โ ๐ ๐ฅ๐๐
๐๐
๐
๐ ๐ฅ =๐ ๐ฅ
๐ ๐ฅ + ๐ ๐ฅ
62
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
Beyond Gauss
Kernels on the Statistical Manifold
Hellinger Kernel
๐พ๐ป ๐, ๐ = exp โ๐๐ฟ๐ป2 ๐, ๐
Laplace Kernel ๐พ๐ฟ ๐, ๐ = exp โ๐๐ฟ๐ป ๐, ๐
Jeffrey Kernel ๐พ๐ฝ ๐, ๐ = exp โ๐๐ฟ๐ฝ ๐||๐
Dimensionality Reduction ๐โ = argmin
๐๐ฟ ๐ , ๐ . ๐ก. ๐๐๐ = ๐ผ๐
๐ฟ ๐ = ๐ ๐๐ , ๐๐ โ ๐ฟ ๐๐๐๐ , ๐๐๐๐
๐,๐
High affinity ๐ ๐๐ , ๐๐ โ small distance after mapping
Low/negative affinity ๐ ๐๐ , ๐๐ โ large distance after mapping
Optimization by conjugate gradient on a Grassmann manifold.
Affinity
63
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
64
Set model IV: statistics (COV+)
Properties The natural raw statistics of a sample set
Flexible model of multiple-order statistical information
Methods CDL [CVPRโ12]
LMKML [ICCVโ13]
DARG [CVPRโ15]
B. Gauss [ICCVโ15]
SPD-ML [ECCVโ14]
LEML [ICMLโ15]
LERM [CVPRโ14]
HER [CVPRโ15 ]
โฆ
G1
G2
Natural raw statistics
More flexible model
[Shakhnarovich, ECCVโ02]
[Arandjeloviฤ, CVPRโ05]
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
65
SPD-ML (SPD Manifold Learning) [ECCVโ14] Pioneering work on explicit manifold-to-manifold
dimensionality reduction
Metric learning: on Riemannian manifold
Set model IV: statistics (COV+)
[1] M. Harandi, M. Salzmann, R. Hartley. From Manifold to Manifold: Geometry-Aware
Dimensionality Reduction for SPD Matrices. ECCV 2014.
Structure distortion
No explicit mapping
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
66
SPD-ML
SPD manifold dimensionality reduction Mapping function: ๐: ๐ฎ++
๐ ร โ๐ร๐ โ ๐ฎ++๐
๐ ๐ฟ, ๐พ = ๐พ ๐๐ฟ๐พ โ ๐ฎ++๐ โป 0, ๐ฟ โ ๐ฎ++
๐ , ๐พ โ โ๐ร๐ (full rank)
Affine invariant metrics: AIRM / Stein divergence on
target SPD manifold ๐ฎ++๐
๐ฟ2 ๐พ ๐๐ฟ๐๐พ , ๐พ ๐๐ฟ๐๐พ = ๐ฟ2 ๐พ๐๐ฟ๐๐พ, ๐พ๐๐ฟ๐๐พ
๐พ = ๐ด๐พ, ๐ด โ ๐บ๐ฟ ๐ , ๐พ โ โ๐ร๐, ๐พ๐๐พ = ๐ฐ๐
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
67
SPD-ML
Discriminative learning Discriminant function
Graph Embedding formalism with an affinity matrix that
encodes intra-class and inter-class SPD distances
min ๐ฟ ๐พ = min ๐จ๐๐๐ฟ2(๐พ๐๐ฟ๐๐พ, ๐พ๐๐ฟ๐๐พ)๐๐
๐ . ๐ก. ๐พ๐๐พ = ๐ฐ๐(orthogonality constraint)
Optimization Optimization problems on Stiefel manifold, solved by
nonlinear Conjugate Gradient (CG) method
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
68
LEML (Log-Euclidean Metric Learning) [ICMLโ15] Learning tangent map by preserving matrix symmetric
structure
Metric learning: on Riemannian manifold
Set model IV: statistics (COV+)
[1] Z. Huang, R. Wang, S. Shan, X. Li, X. Chen. Log-Euclidean Metric Learning on Symmetric
Positive Definite Manifold with Application to Image Set Classification. ICML 2015.
CDL [CVPRโ12]
(a)
(b2)
(c)
OR
ร ๐
ร ๐
ร ๐
(b1)
(d) (e)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
69
LEML
SPD tangent map learning Mapping function:๐ท๐น: ๐ log(๐บ) = ๐พ๐log(๐บ)๐พ
๐พ is column full rank
Log-Euclidean distance in the target tangent space
๐๐ฟ๐ธ๐ท ๐(๐ป๐), ๐(๐ป๐) = ||๐พ๐๐ป๐๐พ โ ๐พ๐๐ป๐๐พ||๐น
= ๐ก๐(๐ธ(๐ป๐ โ ๐ป๐)(๐ป๐ โ ๐ป๐))
๐บ
๐๐
๐๐บ๐+๐ ๐๐น(๐บ)๐+
๐ ๐ท๐น ๐บ [๐๐]
๐+๐
๐+๐
๐น(๐บ)
๐น
๐ซ๐ญ(๐บ)
๐พ(๐ก) ๐น(๐พ ๐ก )
analogy to 2DPCA
๐ธ = (๐พ๐พ๐)2 ๐ป = log(๐บ)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
70
LEML
Discriminative learning Objective function
arg min๐,๐
๐ท๐๐ ๐ธ, ๐ธ๐ + ๐๐ท๐๐ ๐, ๐๐
s. t. , tr ๐ธ๐จ๐๐๐ ๐๐๐ โค ๐๐ ๐,๐ , ๐, ๐ โ ๐บ
tr ๐ธ๐จ๐๐๐ ๐๐๐ โฅ ๐๐ ๐,๐ , (๐, ๐) โ ๐ซ
๐๐๐ = log ๐ช๐ โ log (๐ช๐), ๐ท๐๐: LogDet divergence
Optimization Cyclic Bregman projection algorithm [Bregmanโ1967]
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
71
Set model IV: statistics (COV+)
Properties The natural raw statistics of a sample set
Flexible model of multiple-order statistical information
Methods CDL [CVPRโ12]
LMKML [ICCVโ13]
DARG [CVPRโ15]
B. Gauss [ICCVโ15]
SPD-ML [ECCVโ14]
LEML [ICMLโ15]
LERM [CVPRโ14]
HER [CVPRโ15 ]
โฆ
G1
G2
Natural raw statistics
More flexible model
[Shakhnarovich, ECCVโ02]
[Arandjeloviฤ, CVPRโ05]
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
72
LERM (Learning Euclidean-to-Riemannian Metric) [CVPRโ14]
Application scenario: still-to-video face recognition
Metric learning: cross Euclidean space and Riemannian manifold
[1] Z. Huang, R. Wang, S. Shan, X. Chen. Learning Euclidean-to-Riemannian Metric for Point-to-Set
Classification. IEEE CVPR 2014.
โ๐ท
Watch list
โฆ
Surveillance video
Point Set
Point-to-Set
Classification
Watch list screening
Set model IV: statistics (COV+)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
73
LERM
Point-to-Set Classification Euclidean points vs. Riemannian points
โณ โ๐ท
[Hamm, ICMLโ08]
[Harandi, CVPRโ11]
[Hamm, NIPSโ08]
[Pennec, IJCVโ06]
[Arsigny, SIAMโ07] Point
Set model
Corresponding
manifold:
1.Grassmann (G)
2.AffineGrassmann (A)
3.SPD (S)
Euclidean space (E) Heterogeneous
Linear subspace Affine hull Covariance matrix Linear subspace [Yamaguchi, FGโ98]
[Chien, PAMIโ02]
[Kim, PAMIโ07]
Affine hull [Vincent, NIPSโ01]
[Cevikalp, CVPRโ10]
[Zhu, ICCVโ13]
Covariance matrix [Wang, CVPRโ12]
[Vemulapalli, CVPRโ13]
[Lu, ICCVโ13]
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
74
LERM
โณ โณ โ๐ท โ๐ท
Basic idea
Reduce Euclidean-to-Riemannian metric to
classical Euclidean metric Seek maps ๐น,ฮฆ to a common Euclidean subspace
๐ ๐๐, ๐๐ = (๐ญ ๐๐ โ ๐ฝ ๐๐ )๐ป(๐ญ ๐๐ โ ๐ฝ ๐๐ )
โ๐
๐น ฮฆ
โ๐
๐ ๐
โ๐ท1 โ๐ท2 Conventional Metric Learning
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
75
LERM
Basic idea
Bridge Euclidean-to-Riemannian gap Hilbert space embedding
Adhere to Euclidean geometry
Globally encode the geometry of manifold
โณ โณ โ๐ท โ๐ท
โ๐
๐น
โ๐ฆ
๐๐
๐๐ ฮฆ
โณ โณ
Tangent space (locally)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
76
LERM
Formulation Three Mapping Modes
๐๐
โณ
โ๐ฆ
โ๐ท
โ๐
๐๐
โ๐ฅ ๐๐ฅ
๐๐ฅ
Mapping Mode 2
Double Hilbert spaces
โ๐
โ๐ฅ ๐๐ฅ ๐๐
โ๐ท โณ
โ๐ฆ ๐๐ฆ
โฒ ๐๐ฅโฒ
Mapping Mode 3
+Cross-space kernel
Further
reduce
the gap โ๐ท โณ
โ๐
๐๐ โ๐ฆ
๐๐ฅ ๐๐
Mapping Mode 1
Single Hilbert space
Further explore
correlations
Same:
Objective fun. ๐ฌ(๐ญ, ๐)
Different:
Final maps ๐ญ,๐ฝ
๐ญ ๐ฝ
๐ญ ๐ฝ ๐ญ ๐ฝ
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
77
LERM
e.g. Mode 1
Further explore
correlations
โ๐
โ๐ฅ ๐๐ฅ ๐๐
โ๐ท โณ
โ๐ฆ ๐๐ฆ
โฒ ๐๐ฅโฒ
โ๐
โ๐ฅ ๐๐ฅ ๐๐
โ๐ท โณ
โ๐ฆ ๐๐ฆ
โฒ ๐๐ฅโฒ
Distance metric:
๐ ๐๐, y๐ = (๐ญ ๐๐ โ ๐ฝ ๐๐ )๐ป(๐ญ ๐๐ โ ๐ฝ ๐๐ )
Objective function: ๐ฌ(๐ญ, ๐)
min๐น,๐ฝ
{ ๐ซ ๐ญ, ๐ฝ + ๐1 ๐ฎ(๐ญ, ๐ฝ) + ๐2 ๐ป ๐ญ, ๐ฝ }
Distance Geometry Transformation
๐๐
โณ
โ๐ฆ โ๐ฅ ๐๐ฅ
โ๐ท
โ๐
๐๐ ๐๐ฅ
Final maps:
๐ญ = ๐๐ = ๐พ๐๐ป๐ฟ
๐ฝ = ๐๐ โ ๐๐ = ๐พ๐๐ป๐ฒ๐
๐๐๐, ๐๐๐
= ๐พ๐ฆ ๐, ๐
๐พ๐ฆ ๐, ๐ = exp (โ๐2(๐ฆ๐ , ๐ฆ๐)/2๐2)
Single Hilbert space
Riemannian metrics [ICMLโ08, NIPSโ08, SIAMโ06]
โ๐ท โณ
โ๐
โ๐ฆ ๐๐ฅ ๐๐
๐๐
Normalize Transformation
Discriminate Distance
Preserve Geometry
๐ญ ๐ฝ
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
78
HER (Hashing across Euclidean and Riemannian) [CVPRโ15]
Application scenario: Image-video face retrieval
Metric learning: hamming distance learning across heter. spaces
[1] Y. Li, R. Wang, Z. Huang, S. Shan, X. Chen. Face Video Retrieval with Image Query via Hashing
across Euclidean Space and Riemannian Manifold. IEEE CVPR 2015.
Set model IV: statistics (COV+)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
79
HER
Heterogeneous Hash Learning
The Proposed Heterogeneous Hash Learning Method
Previous Multiple Modalities Hash Learning Methods
Euclidean Space B Euclidean Space A
Hamming Space
001
011 111
110
100 000
101
Riemannian Manifold Euclidean Space
Hamming Space
001
011 111
110
100 000
101
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
80
HER
Two-Stage Architecture Stage-1: kernel mapping
Euclidean Kernel (Image)
๐ฒ๐๐๐
= exp(โ๐๐ โ ๐๐ 2
2
2๐๐2 )
Riemannian Kernel (Video)
๐ฒ๐๐๐
= exp(โlog(๐๐) โ log(๐๐) ๐น
2
2๐๐2 ) *
Reproducing Kernel Hilbert Space (Euc.)
Reproducing Kernel Hilbert Space (Rie.)
011
Target Hamming Space
001
111
110
100 000
101
Subject A
Subject B
Subject C
๐ ๐
[*] Y. Li, R. Wang, Z. Cui, S. Shan, X. Chen. Compact Video Code and Its Application to Robust Face
Retrieval in TV-Series. BMVC 2014. (**: CVC method for video-video face retrieval)
Euclidean Space (Image-Vector-๐๐)
Riemannian Manifold (Video-Covariance Matrix-๐๐)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
81
HER
Two-Stage Architecture Stage-2: binary encoding
๐ ๐
Euclidean Space (Image-Vector-๐๐)
Riemannian Manifold (Video-Covariance Matrix-๐๐)
Reproducing Kernel Hilbert Space (Euc.)
Reproducing Kernel Hilbert Space (Rie.)
Hyperplane Discriminability Stability
011
Target Hamming Space
001
111
110
100 000
101
Subject A
Subject B
Subject C
Hash Functions
๐๐ Hash Functions
๐๐
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
82
HER
Binary encoding [Rastegari, ECCVโ12]
Discriminability
๐ธ๐ = ๐(๐ฉ๐๐, ๐ฉ๐
๐)
๐,๐โ๐๐โ{1:๐ถ}
โ ๐๐ ๐ ๐ฉ๐๐, ๐ฉ๐
๐
๐2โ 1:๐ถ๐1โ ๐2,๐โ๐2
๐1โ 1:๐ถ๐โ๐1
๐ธ๐ = ๐(๐ฉ๐๐, ๐ฉ๐
๐)
๐,๐โ๐๐โ{1:๐ถ}
โ ๐๐ ๐(๐ฉ๐๐, ๐ฉ๐
๐)
๐2โ{1:๐ถ}๐1โ ๐2,๐โ๐2
๐1โ{1:๐ถ}๐โ๐1
๐ธ๐๐ = ๐(๐ฉ๐๐, ๐ฉ๐
๐)
๐,๐โ๐๐โ{1:๐ถ}
โ ๐๐๐ ๐(๐ฉ๐๐, ๐ฉ๐
๐)
๐2โ{1:๐ถ}๐1โ ๐2,๐โ๐2
๐1โ{1:๐ถ}๐โ๐1
Objective Function
min๐๐,๐๐,๐๐,๐๐,๐ฉ๐,๐ฉ๐
๐1๐ธ๐ + ๐2๐ธ๐ + ๐3๐ธ๐๐ + ๐พ1 ๐๐๐ 2
๐โ{1:๐พ}
+ ๐ถ1 ๐๐๐๐
๐โ{1:๐พ}๐โ{1:๐}
+ ๐พ2 ๐๐๐ 2
๐โ{1:๐พ}
+ ๐ถ2 ๐๐๐๐
๐โ{1:๐พ}๐โ{1:๐}
s.t.
๐ฉ๐๐๐ = ๐ ๐๐ ๐๐
๐๐๐ ๐๐ , โ๐ โ 1: ๐พ , โ๐ โ 1: ๐
๐ฉ๐๐๐ = ๐ ๐๐ ๐๐
๐๐๐ ๐๐ , โ๐ โ 1: ๐พ , โ๐ โ 1: ๐
๐ฉ๐๐๐ ๐๐
๐๐๐ ๐๐ โฅ 1 โ ๐๐
๐๐ , ๐๐๐๐ > 0, โ๐ โ 1: ๐พ , โ๐ โ 1: ๐
๐ฉ๐๐๐ ๐๐
๐๐๐ ๐๐ โฅ 1 โ ๐๐
๐๐ , ๐๐๐๐ > 0, โ๐ โ 1: ๐พ , โ๐ โ 1: ๐
Compatibility (Cross-training)
Stability (SVM) Discriminability (LDA)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
83
DL extension for video hash learning
DVC (Deep Video Code) [ACCVโ16] Application scenario: video-video face retrieval
Metric learning: hamming distance learning
[1] S. Qiao, R. Wang, S. Shan, X. Chen. Deep Video Code for Efficient Face Video Retrieval. ACCV 2016.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
84
DVC
An end-to-end video hashing framework A multi-branch CNN for frame-level feature extraction
Temporal pooling for fusing complementary feature
Learning to hash with upper bounded triplet loss function
[*] H. Liu, R. Wang, S. Shan, X. Chen. Deep Supervised Hashing for Fast Image Retrieval. IEEE CVPR
2016. (**: DSH method for DL-architecture based hash learning for image retrieval)
Upper Bounded
Triplet Loss *
CNN
CNN
CNN
Temporal Feature Pooling
Fully
Connected
โฆ
Frame-level CNN Feature Extraction Video-level Representation Binary Encoding
Training Batch ๐ data
gradient
Test Video
sign
โฆ
โฆโฆ
โฆโฆ
โฆโฆ
โฆโฆ
โฆโฆ
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
85
Outline
Background
Literature review
Evaluations
Summary
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
86
Evaluations
Two YouTube datasets YouTube Celebrities (YTC) [Kim, CVPRโ08]
47 subjects, 1910 videos from YouTube
YouTube FaceDB (YTF) [Wolf, CVPRโ11] 3425 videos, 1595 different people
YTC
YTF
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
87
Evaluations
COX Face [Huang, ACCVโ12/TIPโ15]
1,000 subjects each has 1 high quality images, 3 unconstrained video
sequences
Images
Videos
http://vipl.ict.ac.cn/resources/datasets/cox-face-dataset/cox-face
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
88
Evaluations
PaSC [Beveridge, BTASโ13]
Control videos
1 mounted video camera
1920*1080 resolution
Handheld videos
5 handheld video cameras
640*480~1280*720 resolution
Control video Handheld video
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
89
Evaluations
Results (reported in our DARG paper*)
[*] W. Wang, R. Wang, Z. Huang, S. Shan, X. Chen. Discriminant Analysis on Riemannian Manifold of
Gaussian Distributions for Face Recognition with Image Sets. IEEE CVPR 2015.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
90
Evaluations
Results (reported in our DARG paper*)
VR@FAR=0.01 on PaSC AUC on YTF
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
91
Evaluations
Performance on PaSC Challenge (IEEE FGโ15)
HERML-DeLF DCNN learned image feature
Hybrid Euclidean and Riemannian Metric Learning*
[*] Z. Huang, R. Wang, S. Shan, X. Chen. Hybrid Euclidean-and-Riemannian Metric Learning for Image
Set Classification. ACCV 2014. (**: the key reference describing the method used for the challenge)
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
92
Evaluations
Performance on EmotiW Challenge (ACM ICMIโ14)*
Combination of multiple statistics for video modeling
Learning on the Riemannian manifold
[*] M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, X. Chen. Combining Multiple Kernel Methods on
Riemannian Manifold for Emotion Recognition in the Wild. ACM ICMI 2014.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
93
Outline
Background
Literature review
Evaluations
Summary
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
94
Route map of our methods
MMD CVPRโ08/TIPโ12
MDA CVPRโ09
CDL CVPRโ12
DARG CVPRโ15
LERM CVPRโ14
CVC BMVCโ14
LEML/PML ICMLโ15/CVPRโ15
HER CVPRโ15
Complex distribution
Large amount of data
Appearance manifold
Natural raw statistics
No assum. of data dist.
Covariance matrix
binary heter.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
95
Summary
What we learn from current studies
Set modeling
Linear(/affine) subspace Manifold Statistics
Set matching
Non-discriminative Discriminative
Metric learning Euclidean space Riemannian manifold
Future directions More flexible set modeling for different scenarios
Multi-model combination
Learning method should be more efficient
Set-based vs. sample-based?
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
96
Additional references (not listed above)
[Arandjeloviฤ, CVPRโ05] O. Arandjeloviฤ, G. Shakhnarovich, J. Fisher, R.
Cipolla, and T. Darrell. Face Recognition with Image Sets Using Manifold
Density Divergence. IEEE CVPR 2005.
[Chien, PAMIโ02] J. Chien and C. Wu. Discriminant waveletfaces and nearest
feature classifiers for face recognition. IEEE T-PAMI 2002.
[Rastegari, ECCVโ12] M. Rastegari, A. Farhadi, and D. Forsyth. Attribute
discovery via predictable discriminative binary codes. ECCV 2012.
[Shakhnarovich, ECCVโ02] G. Shakhnarovich, J. W. Fisher, and T. Darrell.
Face Recognition from Long-term Observations. ECCV 2002.
[Vemulapalli, CVPRโ13] R. Vemulapalli, J. K. Pillai, and R. Chellappa. Kernel
learning for extrinsic classification of manifold features. IEEE CVPR 2013.
[Vincent, NIPSโ01] P. Vincent and Y. Bengio. K-local hyperplane and convex
distance nearest neighbor algorithms. NIPS 2001.
Ins
titute
of C
om
pu
ting
Tech
no
log
y, C
hin
ese A
cad
em
y o
f Scie
nc
es
97 97
Thanks, Q & A
Lab of Visual Information Processing and Learning (VIPL) @ICT@CAS
Codes of our methods available at: http://vipl.ict.ac.cn/resources/codes
Zhiwu Huang Yan Li Wen Wang Mengyi Liu Shishi Qiao
Xilin Chen Shiguang Shan Ruiping Wang Haomiao Liu