crv 2010 -...
TRANSCRIPT
A short tutorial onGaussian Mixture Models
CRV 2010
By: Mohand Saïd AlliliUniversité du Québec en Outaouais
1
2
Plan
-Introduction
-What is a Gaussian mixture model?
-The Expectation-Maximization algorithm
-Some issues
-Applications of GMM in computer vision
3
A few basic equalities that are often used:
)()|()( BpBApBAp =∩
• Conditional probabilities:
)()()|()|(
APBpBApABp =
• Bayes rule:
•
)()(
,:
1i
K
i
ji
K
1ii
BApAp
BBjiB
∩=
=∩≠∀Ω=
∑=
=
:then and if φU
Introduction
44
Carl Friedrich Gauss invented the normal distribution in 1809 as a way to rationalize the method of least squares.
Introduction
5
For d dimensions, the Gaussian distribution of a vector x =(x1, x2, …,xd)T
is defined by:
⎟⎠⎞
⎜⎝⎛ −Σ−−
Σ=Σ − )()(
21exp
||)2(1),|( 1
2/ μμπ
μ xxxΝ Td
What is a Gaussian?
Gaussian. the of matrix covariancethe is and mean the is where Σμ
Example: T)0,0(=μ ⎟⎟⎠
⎞⎜⎜⎝
⎛=Σ
00.130.030.025.0
6
What is a Gaussian mixture model?
Examples:
d=1: d=2:
The probability given in a mixture of K Gaussians is:
∑=
Σ⋅=K
jjjj xΝwxp
1
),|()( μ
Gaussian. th the of (weight) yprobabilit prior the is where jwj
∑=
=K
jjw
1
1 10 ≤≤ jwand
7
data. the fits that model GMM the of parameters the estimate GMM), a(probably ondistributi unknown
an from drawn data of set a Given
θ
},...,,{ 21 NxxxX =
• Problem:
What is a Gaussian mixture model?
• Solution:
?)|(
parameters model the toregard withdata the of likelihood the Maximize θXp
∏=
==N
iixpXp
1
* )|(maxarg )|(maxarg θθθθθ
The Expectation-Maximization algorithm
• Basic ideas of the EM algorithm:
.likelihood the of onmaximizati the simplify wouldknowledge its that such variable hidden a Introduce -
:iteration each At-
• E-Step: Estimate the distribution of the hidden variable given the data and the current value of the parameters.
• M-Step: Maximize the joint distribution of the data and the hidden variable.
8
algorithm. (EM) onMaximizati-nExpectatio the use to islikelihood the maximize to approaches popular most the of One
9
The EM for the GGM (graphical view 1)
Hidden variable: for each point, which Gaussian generated it?
10
The EM for the GGM (graphical view 2)
E-Step: for each point, estimate the probability that each Gaussian generated it.
11
The EM for the GGM (graphical view 3)
M-Step: modify the parameters according to the hidden variable to maximize the likelihood of the data (and the hidden variable).
12
General formulation of the EM
)Z is name variablehidden the follows what (In
:function auxillary"" following the Consider
[ ]))|,(log(),( tZ
t ZXpEQ θθθ =
:),( tQ θθ maximizing that shown been can It
[ ]),(maxarg1 tt Q θθθθ
=+
data. the of dliekelihoo the maximizes always
[ ]
)]|(log[)],|(log[),|(
)]|(),|(log[),|(
))|,(log(),|(
))|,(log(),(
θθθ
θθθ
θθ
θθθ
XpXZpXzp
XpXZpXzp
ZXpXzp
ZXpEQ
z
t
z
t
z
t
tZ
t
+⎥⎦
⎤⎢⎣
⎡=
⋅=
=
=
∑
∑
∑
13
Proof of EM convergence
)1(
14
Proof of EM convergence
)]|(log[)],|(log[),|(),( tt
z
ttt XpXZpXzpQ θθθθθ +⎥⎦
⎤⎢⎣
⎡= ∑
:have we If ,tθθ =
:have we ,(2) and (1) From
444444 3444444 210
),|(),|(log),|(),(),(
)]|(log[)]|(log[
≥
⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡+−
=−
∑ tz
tttt
t
XZpXZpXzpQQ
XpXp
θθθθθθθ
θθ
increases. likelihood the then increases If )|(),( θθθ XpQ t
Conclusion:
.likelihood maximum the as same the is of maximum The ),( tQ θθ
)2(
15
EM for the GMM
known. always not is ely,Unfortunat Gaussian. by Gaussian
estimated be can parameters the then observed, is if that Note ‐
ZZ
Nixi :,...,1, = point data a for Gaussians of mixture the write us Let‐
∑=
Σ⋅=K
jjjiji xΝwxp
1
),|()( μ
:variable indicator the introduce We‐
otherwise. 0
emitted Gaussian if 1
⎩⎨⎧
=.i
ij
xjz
16
EM for the GMM
:follows as and all of likelihood joint the write now can We ZX
[ ]
[ ] [ ]∏∏
∏∏
= =
= =
=
=
=
N
i
K
j
zzi
N
i
K
j
zi
ijij
ij
jpjxp
jxp
ZXpZXL
1 1
1 1
)|(),|(
)|,(
)|,(),,(
θθ
θ
θθ
:gives function log"" Using
[ ] [ ] [ ]∑∑= =
+=N
i
K
jijiij jpzjxpzZXp
1 1
)|(log),|(log),(log θθ
17
:by given is function auxillary The
[ ][ ] [ ]
[ ] [ ]∑∑
∑∑
= =
= =
+=
⎥⎦
⎤⎢⎣
⎡+=
=
N
i
K
j
tijZi
tijZ
N
i
K
j
tijiijZ
tZ
t
jpzEjxpzE
jpzjxpzE
ZXpEQ
1 1
1 1
)|(log)|(),|(log)|(
|)|(log),|(log
|)),(log(),(
θθθθ
θθθ
θθθ
EM for the GMM
:by given Step‐E the then have We
),(),|(),(
),|(
)|0(0)|1(1)|(
ti
ti
t
ti
tij
tij
tijZ
xpjxpjp
xjp
zpzpzE
θθθ
θ
θθθ
=
=
=×+=×=
(Posterior distribution)
18
:by given Step‐M the And
EM for the GMM
0),(=
∂∂
θθθ tQ
.1),...,1,,,( ∑=
==Σ=K
1jjjjj wKjw and where μθ
:obtain we ons,manipulati rwardstraightfo After
∑
∑
=
== N
i
ti
i
N
i
ti
j
xjp
xxjp
1
1
),|(
),|(
θ
θμ
[ ]
∑
∑
=
=
−−=Σ N
i
ti
Tjiji
N
i
ti
j
xjp
xxxjp
1
1
),|(
))((),|(
θ
μμθ
19
EM for the GMM
:gives multiplier Lagrange using constraint the gcorporatin In ∑=
=K
1jjw 1
⎟⎟⎠
⎞⎜⎜⎝
⎛−−= ∑
=
K
jj
tt wQJ1
1),(),( λθθθθ
:Then
0),|(
),(),(
1
=−=
−∂
∂=
∂∂
∑=
λθ
λθθθθ
N
i j
ti
j
t
j
t
wxjp
wQ
wJ
:gives which
λ
θ∑==
N
i
ti
j
xjpw 1
),|()3(
20
EM for the GMM
:have we Also,
01),(1
=−=∂
∂ ∑=
K
jj
t
wJλθθ
:obtain we(4), and (3) From
1),|(11 1
=∑∑= =
K
j
N
i
tixjp θ
λ
:that follows It N=λ
:finally and ∑=
=N
i
tij xjp
Nw
1
),|(1 θ
)4(
21
Some issues
1- Initialization:- EM is an iterative algorithm which is very sensitive to initial conditions:
trash with up end trash from Start →
- Usually, we use the K-Means to get a good initialization.
2- Number of Gaussians:
- Use information-theoretic criteria to obtain the optima K.
)log()1(21)1()),,(log( NDDDKKZXLMDL
⎭⎬⎫
⎩⎨⎧
⎥⎦⎤
⎢⎣⎡ +++−+−= θ
.,,, etcMMLBICAIC:criteria Other
Ex:
22
3- Simplification of the covariance matrices:
Some issues
Case 1: Spherical covariance matrix Idiag jjjjj2222 ),...,,( σσσσ ==Σ
Case 2: Diagonal covariance matrix ),...,,( 222
21 jdjjj diag σσσ=Σ
-Less precise.-Very efficient to compute.
-More precise.-Efficient to compute.
23
3- Simplification of the covariance matrices:
Some issues
Case 1: Full covariance matrix
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=Σ
221
22212
12121
),(cov),(cov
),(cov),(cov
),(cov),(cov
jddjdj
djjj
djjj
j
xxxx
xxxx
xxxx
σ
σ
σ
L
MMMM
L
L
-Very precise.-Less efficient to compute.
24
Applications of GMM in computer vision
1- Image segmentation:
( )TBGRX ,,=
25
Applications of GMM in computer vision
2- Object tracking:Knowing the moving object distribution in the first frame, we can localizethe objet in the next frames by tracking its distribution.
26
Some references
1. Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer, 2006.
2. Geoffrey McLachlan and David Peel. Finite Mixture Models. Wiley Series in Probability and Statistics, 2000.
3. Samy Bengio. Statistical Machine Leaning from Data: Gaussian Mixture Models. http://bengio.abracadoudou.com/lectures/gmm.pdf
4. T. Hastie et al. The Elements of of Statistical Learning. Springer, 2009.
Questions?
27