face recognition ying wu [email protected] electrical and computer engineering...
TRANSCRIPT
![Page 1: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/1.jpg)
Face Recognition
Ying [email protected]
Electrical and Computer EngineeringNorthwestern University, Evanston, IL
http://www.ece.northwestern.edu/~yingwu
![Page 2: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/2.jpg)
Recognizing Faces?
Lighting
View
![Page 3: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/3.jpg)
Outline
Bayesian Classification Principal Component Analysis (PCA) Fisher Linear Discriminant Analysis (LDA) Independent Component Analysis (ICA)
![Page 4: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/4.jpg)
Bayesian Classification
Classifier & Discriminant Function Discriminant Function for Gaussian Bayesian Learning and Estimation
![Page 5: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/5.jpg)
Classifier & Discriminant Function
Discriminant function gi(x) i=1,…,C Classifier
Example
Decision boundary
ij )()( if xgxgx jii
)(ln)|(ln)(
)(|)|()(
)|()(
iii
iii
ii
pxpxg
pxpxg
xpxg
The choice of D-function is not unique
![Page 6: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/6.jpg)
Multivariate Gaussian
),(~)( Nxp
principal axes
The principal axes (the direction) are given by the eigenvectors of ;
The length of the axes (the uncertainty) is given by the eigenvalues of
x1
x2
![Page 7: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/7.jpg)
Mahalanobis Distance
Mahalanobis distance is a normalized distance
x1
x2
2221 |||||||| xx
MM xx |||||||| 21
)()(|||| 1 xxcx TM
![Page 8: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/8.jpg)
Whitening Whitening:
– Find a linear transformation (rotation and scaling) such that the covariance becomes an identity matrix (i.e., the uncertainty for each component is the same)
y1
y2
x1
x2
y=ATx
p(x) ~ N(, ) p(y) ~ N(AT, ATA)
UUUA TT
e wher :solution 2
1
![Page 9: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/9.jpg)
Disc. Func. for Gaussian
Minimum-error-rate classifier
)(ln||ln2
12ln
2)()(
2
1)(
)(ln)|(ln)(
1iiii
Tii
iii
pd
xxxg
pxpxg
![Page 10: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/10.jpg)
Case I: i = 2I
)(ln22
1)(ln
2
||||)(
22
2
iTi
Ti
Ti
ii pxxxp
xxg
0
22)(ln
2
11)(
iT
i
iTi
T
ii
WxW
pxxg
constant
Liner discriminant function
Boundary:
)(ln)(
where0)( )(
)()(
||||21
0
0
2
2
jipp
ji
ji
Tji
j
i
jix
W
xxWgxg
![Page 11: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/11.jpg)
Example
Assume p(i)=p(j)
i
j
Let’s derive the decision boundary:
02
)(
jiT
ji x
![Page 12: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/12.jpg)
Case II: i= )(ln)()()( 1
21
iiT
ii pxxxg
0
1211 ))(ln()()(
iT
i
iiTi
Tii
WxW
pxxg
)())(
)(ln)(ln)(
2
1
)(
where0)(
10
1
0
jiji
Tji
jiji
ji
T
ppx
W
xxW
The decision boundary is still linear:
![Page 13: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/13.jpg)
Case III: i= arbitrary
)(ln||ln
where)(
211
21
0
1
121
0
iiiiTii
iii
ii
iT
iiT
i
pW
W
A
WxWxAxxg
The decision boundary is no longer linear, but hyperquadrics!
![Page 14: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/14.jpg)
Bayesian Learning
Learning means “training” i.e., estimating some unknowns from
“training data” WHY?
– It is very difficult to specify these unknowns– Hopefully, these unknowns can be recovered
from examples collected.
![Page 15: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/15.jpg)
Maximum Likelihood Estimation
Collected examples D={x1,x2,…,xn}
Estimate unknown parameters in the sense that the data likelihood is maximized
Likelihood Log Likelihood
ML estimation
n
kkxpDp
1
)|()|(
)|(maxarg)|(maxarg*
DLDp
n
kkxpDpL
1
)|(ln)|(ln)(
![Page 16: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/16.jpg)
Case I: unknown
)()(|]|)2ln[()|(ln 121
21
kT
kd
k xxxp
n
kk
n
kk
kk
xn
x
xxp
11
1
1
1ˆ 0)ˆ(
)()|((ln
![Page 17: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/17.jpg)
Case II: unknown and 22
21
21 let )(2ln),|(ln kk xxp
2
2
2
)(
2
1),|(ln
)(1),|(ln
kk
kk
xxp
xxp
0ˆ
)ˆ(ˆ1
0)ˆ(ˆ1
12
2
1
1
n
k
kn
k
k
n
k
x
x
n
kk
n
kk
xn
xn
1
222
1
)ˆ(1ˆˆ
1ˆ
Tk
n
kk
n
kk
xxn
xn
)ˆ()ˆ(1ˆ
1ˆ
1
1
generalize
![Page 18: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/18.jpg)
Bayesian Estimation
Collected examples D={x1,x2,…,xn}, drawn independently from a fixed but unknown distribution p(x)
Bayesian learning is to use D to determine p(x|D), i.e., to learn a p.d.f.
p(x) is unknown, but has a parametric form with parameters ~ p()
Difference from ML: in Bayesian learning, is not a value, but a random variable and we need to recover the distribution of , rather than a single value.
![Page 19: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/19.jpg)
Bayesian Estimation
This is obvious from the total probability rule, i.e., p(x|D) is a weighted average over all
If p( |D) peaks very sharply about some value *, then p(x|D) ~ p(x| *)
dDpxpdDxpDxp )|()|()|,()|(
![Page 20: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/20.jpg)
The Univariate Case
assume is the only unknown, p(x|)~N(, 2) is a r.v., assuming a prior p() ~ N(0, 0
2), i.e., 0 is the best guess of , and 0 is the uncertainty of it.
),(~)( ),,(~)|( where
)()|()()|()|(
20
2
1
ok
n
kk
NpNxp
pxppDpDp
kk
n xDp
)(2(exp)|( 2
0222
12121
P(|D) is also a Gaussian for any # of training examples
![Page 21: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/21.jpg)
The Univariate Case
220
2202
0220
2
220
20
n
2
ˆ
have we
),(~)|(let weif
n
nn
n
NDp
n
n
nn
The best guess for after observing n examples
n measures the uncertainty of this guess after observing n examples
n=1
n=5
n=10
n=30
p(|x1,…,xn) p(|D) becomes more and more sharply peaked when observing more and more examples, i.e., the uncertainty decreases.
nn
22
![Page 22: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/22.jpg)
The Multivariate Case
),(~)( ),(~)|( 00 NpNxp
n
kkn
nnn
nnnnn
nn
xn
NDp
1
11100
011
0111
00
1ˆ where
)(
ˆ)(ˆ)(
have we),,(~)|(let
),(~)|()|()|( and nnNdDpxpDxp
![Page 23: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/23.jpg)
PCA and Eigenface
Principal Component Analysis (PCA) Eigenface for Face Recognition
![Page 24: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/24.jpg)
PCA: motivation
Pattern vectors are generally confined within some low-dimensional subspaces
Recall the basic idea of the Fourier transform– A signal is (de)composed of a linear combination
of a set of basis signal with different frequencies.
![Page 25: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/25.jpg)
PCA: idea
emx
222
1
21
||||)(2||||
||)(||),,...,(
mxmxee
xemeJ
kkT
kk
n
kkkn
)( 0)(22 mxemxeJ
kT
kkT
kk
m
e x
xk
![Page 26: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/26.jpg)
PCA
2
2
222
||||
||||))((
||||2)(
mxeSe
mxemxmxe
mxeJ
k
kT
kkT
kkk
1|||| s.t. maxarg)(minarg eSeeeJ T
ee
)1(maxarg* eeSeee TT
e
1 i.e., ,0 SeeeSe TTo maximize eTSe, we need to select max
![Page 27: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/27.jpg)
Algorithm
Learning the principal components from {x1, x2, …, xn}
] ..., , [ (5)
u and sorting )4(
ion decomposit eigenvalue )3(
))(( (2)
] ..., ,[ ,1
)1(
21
i
1
11
Tm
TTT
i
T
Tn
k
Tkk
n
n
kk
uu,u P
UUS
AAmxmxS
mxmxAxn
m
![Page 28: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/28.jpg)
PCA for Face Recognition
Training data D={x1, …, xM}– Dimension (stacking the pixels together to make a vector
of dimension N)
– Preprocessing cropping normalization
These faces should lie in a “face” subspace Questions:
– What is the dimension of this subspace?
– How to identify this subspace?
– How to use it for recognition?
![Page 29: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/29.jpg)
Eigenface
The EigenFace approach: M. Turk and A. Pentland, 1992
![Page 30: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/30.jpg)
An Issue
In general, N >> M However, S, the covariance matrix, is NxN! Difficulties:
– S is ill-conditioned. Rank(S)<<N– The computation of the eigenvalue decomposition
of S is expensive when N is large Solution?
![Page 31: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/31.jpg)
Solution I: Let’s do eigenvalue decomposition on ATA,
which is a MxM matrix ATAv=v AATAv= Av To see is clearly! (AAT) (Av)= (Av) i.e., if v is an eigenvector of ATA, then Av
is the eigenvector of AAT corresponding to the same eigenvalue!
Note: of course, you need to normalize Av to make it a unit vector
![Page 32: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/32.jpg)
Solution II:
You can simply use SVD (singular value decomposition)
A = [x1-m, …, xM-m]
A = UVT
– A: NxM– U: NxM UTU=I : MxM diagonal – V: MxM VTV=VVT=I
![Page 33: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/33.jpg)
Fisher Linear Discrimination
LDA PCA+LDA for Face Recognition
![Page 34: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/34.jpg)
When does PCA fail?
PCA
![Page 35: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/35.jpg)
Linear Discriminant Analysis
Finding an optimal linear mapping WFinding an optimal linear mapping W
Catches major difference between classes and Catches major difference between classes and discount irrelevant factorsdiscount irrelevant factors
In the mapped space, data are clusteredIn the mapped space, data are clustered
![Page 36: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/36.jpg)
Within/between class scatters
2211
1
22
11
1
21
~ ,1~
:nsformlinear tra the
1 ,
1
mWmmWxWn
m
xWy
xn
mxn
m
T
Dx
TT
T
DxDx
TB
W
T
Yy
T
Yy
Dx
T
Dx
T
mmmmS
SSS
WSWmySWSWmyS
mxmxSmxmxS
))(( :scatter classbetween
:scatter classwithin
)~(~
,)~(~
))(( ,))((
2121
21
22
2212
11
222111
21
21
![Page 37: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/37.jpg)
Fisher LDA
WSW
WSW
WSSW
WmmmmW
SS
mmWJ
WT
BT
T
TT
)(
))((~~
|~~|)(
21
2121
21
221
)(maxarg* WJWW
problem eigenvalue dgeneralize a is this )(max wSwSWJ WB
![Page 38: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/38.jpg)
Solution I
If Sw is not singular
You can simply do eigenvalue decomposition on SW
-1SB
wwSS BW 1
![Page 39: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/39.jpg)
Solution II
Noticing:– SBW is on the direction of m1-m2 (WHY?)
– We are only concern about the direction of the projection, rather than the scale
We have
)( 211 mmSw W
![Page 40: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/40.jpg)
Multiple Discriminant Analysis
C
kkW
Dx
Tiii
Dxii
SS
mxmxS
xn
m
i
i
1
))((
,1
WSWmmmmnS
WSWS
mnn
m
mWyn
m
BT
C
k
TiiiB
WT
W
C
kkk
iT
Yyii
i
1
1
)~~)(~~(~
,~
,~1~
,1~
xWy T
||
||
|~
|
|~
|maxarg*
WSW
WSW
S
SW
WT
BT
W
B
W
iWiB wSwS
![Page 41: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/41.jpg)
Comparing PCA and LDA
PCA
LDA
![Page 42: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/42.jpg)
MDA for Face Recognition
Lighting
• PCA does not work well! (why?)
• solution: PCA+MDA
![Page 43: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/43.jpg)
Independent Component Analysis
The cross-talk problem ICA
![Page 44: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/44.jpg)
Cocktail party
t
S1(t)
S2(t)
t
t
x1(t)
t
x2(t)
t
x3(t)
A
Can you recover s1(t) and s2(t) from x1(t), x2(t) and x3(t)?
![Page 45: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/45.jpg)
Formulation
ASXsaX
jsasaxn
iii
njnjj
or
...
1
11
Both A and S are unknowns!
Can you recover both A and S from X?
![Page 46: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/46.jpg)
The Idea
y is a linear combination of {si} Recall the central limit theory! A sum of even two independent r.v. is more
Gaussian than the original r.v. So, ZTS should be more Gaussian than any of {si} In other words, ZTS become least Gaussian when
in fact equal to {si} Amazed!
SZASWXWY TTT
![Page 47: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c985503460f949542d0/html5/thumbnails/47.jpg)
Face Recognition: Challenges
View