Multivariate Robust clustering via mixture models
INRIA-LJK équipe MISTIS (Grenoble)
Florence Forbes, Darren Wraith, Senan Doyle, Eric Frichot
Model-based multivariate clustering ModelModel--based multivariate clustering based multivariate clustering
Model-based clustering: Mixture of gaussians
• Univariate and multivariate
• Decomposition of the covariance matrix for flexibility in shape, volume and orientation (Banfield and Raftery 93, Celeux and Govaert 95)
ffff ((((yyyyiiii)))) ====KKKK∑∑∑∑
kkkk====1111
ππππkkkkffffkkkk((((yyyyiiii;;;; θθθθkkkk))))
WWWWiiiitttthhhh ffffkkkk((((yyyyiiii;;;; θθθθkkkk)))) ∼∼∼∼ NNNN ((((µµµµkkkk,,,,ΣΣΣΣkkkk))))
ΣΣΣΣkkkk ==== λλλλkkkk DDDDkkkk AAAAkkkkDDDDTTTTkkkk
wwwwiiiitttthhhh λλλλkkkk :::: vvvvoooolllluuuummmmeeeeDDDDkkkk :::: oooorrrriiiieeeennnnttttaaaattttiiiioooonnnnAAAAkkkk :::: sssshhhhaaaappppeeee
Convenient computational tractability:
EM algorithm
+ additional minimization algorithm for some decompositions (see Celeux, Govaert 95)
Robust clustering Robust clustering Robust clustering In some applications:
•The tails of the normal distributions are shorter than appropriate or
•Parameter estimations are affected by atypical observations (outliers)
Fit a mixture of t-distribution (Student distribution)
• Univariate and multivariate
• Additional degree of freedom (dof) parameter
The dof can be seen as a robustness tuning parameter
Convenient computational tractability:
EM algorithm with additional missing variables (class+ weights)
+ additional numerical procedure for the ML estimate of the dof(see MacLachlan and Peel 2000)
In contrast to the Gaussian case, no closed-form solution for ML
ffffkkkk((((yyyyiiii;;;; θθθθkkkk)))) ∼∼∼∼ tttt((((yyyyiiii;;;;µµµµkkkk,,,,ΣΣΣΣkkkk,,,, ννννkkkk))))
νννν →→→→∞∞∞∞ ⇒⇒⇒⇒ ffffkkkk →→→→ NNNN ddddiiiissssttttrrrriiiibbbbuuuuttttiiiioooonnnn
But a useful representation of the t-distribution as an infinite mixture of scaled Gaussians
Scaled mixture of GaussiansScaled mixture of GaussiansScaled mixture of Gaussians
The EM algorithm for t-mixtures is based on the following construction of the t-distribution
tttt((((yyyy µµµµ,,,,ΣΣΣΣ,,,, νννν)))) ====∫∫∫∫∞∞∞∞0000NNNN ((((yyyy;;;;µµµµ,,,,ΣΣΣΣ////wwww)))) GGGG((((wwww;;;; νννν////2222,,,, νννν////2222)))) ddddwwww
Another construction (equivalent):
XXXX ∼∼∼∼ NNNN ((((0000,,,,ΣΣΣΣ)))) aaaannnndddd VVVV ∼∼∼∼ XXXX 2222((((νννν)))) ==== GGGG((((νννν////2222,,,, 1111////2222))))
YYYY ==== XXXX ××××√√√√(((( ννννVVVV )))) ++++ µµµµ ∼∼∼∼ tttt((((µµµµ,,,,ΣΣΣΣ,,,, νννν))))
VVVVνννν∼∼∼∼ GGGG((((νννν////2222,,,, νννν////2222))))
tttt((((yyyy;;;;µµµµ,,,,ΣΣΣΣ,,,, νννν)))) ==== ΓΓΓΓ((((((((MMMM++++νννν))))////2222))))ΓΓΓΓ((((νννν////2222)))) ((((ννννππππ))))MMMM////2222 ||||ΣΣΣΣ||||
−−−−1111////2222 [[[[1111 ++++ δδδδ2222////νννν]]]]−−−−((((νννν++++MMMM))))////2222
wwwwiiiitttthhhh δδδδ2222 ==== ((((yyyy −−−− µµµµ))))TTTTΣΣΣΣ−−−−1111((((yyyy −−−− µµµµ)))) tttthhhheeee ssssqqqquuuuaaaarrrreeeedddd MMMMaaaahhhhaaaallllaaaannnnoooobbbbiiiissss ddddiiiissssttttaaaannnncccceeee
MMMM :::: ddddiiiimmmmeeeennnnssssiiiioooonnnnaaaalllliiiittttyyyy ooooffff yyyyΓΓΓΓ:::: GGGGaaaammmmmmmmaaaa ffffuuuunnnnccccttttiiiioooonnnn
EM algorithm for t-mixtures (MoT)EM algorithm for tEM algorithm for t--mixtures (mixtures (MoTMoT))
� Observations:
5
� Missing data:
yyyy ==== {{{{yyyy1111 .... .... ....yyyyNNNN}}}} wwwwhhhheeeerrrreeee yyyyiiii ==== {{{{yyyyiiii1111 .... .... .... yyyyiiiiMMMM}}}}
� Additional missing data:
� Unknown parameters:
Expectation Maximization (EM) for maximum likelihood ( )
IIIItttteeeerrrraaaattttiiiioooonnnn rrrr EEEE----sssstttteeeepppp:::: ccccoooommmmppppuuuutttteeee pppp((((zzzz,,,,wwww||||yyyy;;;;ψψψψ((((rrrr))))))))
MMMM----sssstttteeeepppp:::: ψψψψ((((rrrr++++1111)))) ==== aaaarrrrggggmmmmaaaaxxxxψψψψ∈∈∈∈ΨΨΨΨ
EEEE[[[[lllloooogggg pppp((((ZZZZ,,,,WWWW,,,,yyyy;;;;ψψψψ))))||||yyyy;;;;ψψψψ((((rrrr))))]]]]
zzzz ==== {{{{zzzz1111 .... .... .... zzzzNNNN}}}} wwwwiiiitttthhhh zzzziiii ∈∈∈∈ {{{{eeee1111 .... .... .... eeeeKKKK}}}} ((((KKKK ccccllllaaaasssssssseeeessss))))
yyyyiiii||||wwwwiiii,,,, zzzziiii ==== eeeekkkk ∼∼∼∼ GGGG((((yyyyiiii;;;;µµµµkkkk,,,,ΣΣΣΣkkkkwwwwiiii
))))
wwww ==== {{{{wwww1111 .... .... ....wwwwNNNN }}}}
wwwwiiii||||zzzziiii ==== eeeekkkk ∼∼∼∼ ΓΓΓΓ(((( ννννkkkk2222 ,,,,ννννkkkk2222))))
ψ
ZZZZiiii ∼∼∼∼MMMM((((ππππ1111 .... .... .... ππππKKKK)))) iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt
ψψψψ ==== {{{{µµµµkkkk,,,,ΣΣΣΣkkkk,,,, ννννkkkk,,,, ππππkkkk}}}}
Iterate:
(E)
(M)
EM algorithm for t-mixtures (MoT)EM algorithm for tEM algorithm for t--mixtures (mixtures (MoTMoT))
CCCCoooommmmppppuuuutttteeee qqqq((((rrrr))))ZZZZiiii
((((eeeekkkk)))) ppppoooosssstttteeeerrrriiiioooorrrr ccccllllaaaassssssss mmmmeeeemmmmbbbbeeeerrrrsssshhhhiiiipppp pppprrrroooobbbbaaaabbbbiiiilllliiiittttiiiieeeessss,,,, ffffoooorrrr aaaallllllll iiii,,,, kkkk
CCCCoooommmmppppuuuutttteeee tttthhhheeee ggggaaaauuuussssssssiiiiaaaannnn mmmmeeeeaaaannnnssss aaaannnndddd vvvvaaaarrrriiiiaaaannnncccceeeessss uuuussssiiiinnnngggg
¯̄̄̄wwww((((rrrr))))iiiikkkk ====
νννν((((rrrr))))kkkk ++++MMMM
νννν((((rrrr))))kkkk ++++ δδδδ((((yyyyiiii,,,, µµµµ
((((rrrr))))kkkk ,,,,ΣΣΣΣ
((((rrrr))))kkkk ))))
CCCCoooommmmppppuuuutttteeee ¯̄̄̄wwww((((rrrr))))iiiikkkk aaaassss
µµµµ((((rrrr++++1111))))kkkk ====
NNNN∑∑∑∑
iiii====1111qqqq((((rrrr))))ZZZZiiii((((eeeekkkk)))) ¯̄̄̄wwww
((((rrrr))))iiiikkkk yyyyiiii
NNNN∑∑∑∑
iiii====1111qqqq((((rrrr))))ZZZZiiii((((eeeekkkk)))) ¯̄̄̄wwww
((((rrrr))))iiiikkkk
aaaannnndddd ΣΣΣΣ((((rrrr++++1111))))kkkk ====
NNNN∑∑∑∑
iiii====1111qqqq((((rrrr))))ZZZZiiii((((eeeekkkk)))) ¯̄̄̄wwww
((((rrrr))))iiiikkkk ((((yyyyiiii−−−−µµµµ((((rrrr++++1111))))kkkk ))))((((yyyyiiii−−−−µµµµ((((rrrr++++1111))))kkkk ))))TTTT
NNNN∑∑∑∑
iiii====1111qqqq((((rrrr))))ZZZZiiii((((eeeekkkk)))) ¯̄̄̄wwww
((((rrrr))))iiiikkkk
CCCCoooommmmppppuuuutttteeee tttthhhheeee ddddooooffff νννν((((rrrr++++1111))))kkkk aaaassss aaaa ssssoooolllluuuuttttiiiioooonnnn ooooffff aaaannnn eeeeqqqquuuuaaaattttiiiioooonnnn
IllustrationsIllustrationsIllustrations
MacLachlan, Ng, Bean 2006
Robust Bayesian clusteringRobust Bayesian clusteringRobust Bayesian clustering
Student mixtures + priors on parameters : Bayesian Student mixtures
Inference by Variational Bayes EM
Advantage: automatic and robust model selection
References:
Svensen and Bishop: no prior on the dof and variational approximation qz qw qtheta
Archambeau and Verleysen: uniform prior on dof and “better” approximation qzw qtheta
Takekawa et Fukai: improved on Archambeau and Verleysen with exponential prior on dof
A lot of robust approaches to clustering via mixture models has been based on mixture of Student distribution
Goal: explore the scale mixture framework further
Multivariate heavy tail distributionsMultivariate heavy tail distributionsMultivariate heavy tail distributions
Many ways to generalize the univariate Student distribution (in the Student spirit):
• The standard way has one particular disadvantage as a model for data: all its marginalsare Student but have the same dof and hence the same amount of tailweight.
• Product of independent t-distributions: varying dof but no correlation
• Jones 2002: a dependent bivariate t distribution with marginals of different dof. Extension to multivariate? Joint density not tractable?
• Eltoft et al. 2006: new multivariate scale mixture of Gaussians, more general than Student
X ∼ N (0,Σ) and V ∼ X 2(ν)
Y = X ×√( νV ) + µ ∼ t(µ,Σ, ν)
X ∼ N (0, IdM) and Λ pos.def M ×M with |Λ| = 1,Z a scalar positive variable with pdf to be chosen (eg Γ or X )Y = µ+ Λ1/2 X√
Z
t(y;µ,Σ, ν) = Γ((M+ν)/2)Γ(ν/2) (νπ)M/2 |Σ|
−1/2 [1 + δ2/ν]−(ν+M)/2
with δ2 = (y − µ)TΣ−1(y − µ)
New multivariate heavy tail distributionsNew multivariate heavy tail distributionsNew multivariate heavy tail distributions
Several equivalent constructions:
1. Gaussian scale mixtures
Student like:
Pearson Type VII like:
WWWW====ddddiiiiaaaagggg((((wwww1111,,,,............wwwwMMMM)))) Σ = DADT
2. Generative construction (useful for simulation)
ffff((((yyyy;;;;µµµµ,,,,ΣΣΣΣ,,,,θθθθ1111............θθθθMMMM))))====∞∞∞∞∫∫∫∫0000
∞∞∞∞∫∫∫∫0000NNNN((((yyyy;;;;µµµµ,,,,DDDDWWWW−−−−1111AAAADDDDTTTT)))) gggg1111((((wwww1111;;;;θθθθ1111))))............ggggMMMM((((wwwwMMMM;;;;θθθθMMMM)))) ddddwwww1111............ddddwwwwMMMM
gggg1111((((wwww1111;;;;θθθθ1111))))====GGGG((((wwww1111;;;;νννν1111////2222,,,,νννν1111////2222))))............ggggMMMM((((wwwwMMMM;;;;θθθθMMMM))))====GGGG((((wwwwMMMM;;;;ννννMMMM////2222,,,,ννννMMMM////2222))))
XXXX∼∼∼∼NNNN((((0000,,,, IIIIddddMMMM))))aaaannnndddd ffffoooorrrrmmmm====1111............MMMM,,,, ZZZZmmmm∼∼∼∼ ggggmmmm((((zzzz;;;;θθθθmmmm))))aaaallllllll iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ((((ppppoooossssiiiittttiiiivvvveeee vvvvaaaarrrriiiiaaaabbbblllleeeessss))))
˜̃̃̃XXXX====(((( XXXX1111√√√√ZZZZ1111,,,, ............ XXXXMMMM√√√√
ZZZZMMMM))))TTTT
TTTThhhheeeennnn YYYY ====µµµµ++++ΣΣΣΣ−−−−1111////2222 ˜̃̃̃XXXX
XXXX ∼∼∼∼NNNN((((0000,,,,AAAA))))aaaannnndddd ffffoooorrrr mmmm==== 1111............MMMM,,,, ZZZZmmmm ∼∼∼∼ ggggmmmm((((zzzz;;;; θθθθmmmm))))aaaallllllll iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ((((ppppoooossssiiiittttiiiivvvveeee vvvvaaaarrrriiiiaaaabbbblllleeeessss))))
˜̃̃̃XXXX ==== (((( XXXX1111√√√√ZZZZ1111,,,, ............ XXXXMMMM√√√√
ZZZZMMMM))))TTTT
TTTThhhheeeennnn YYYY ==== µµµµ++++DDDD ˜̃̃̃XXXX
gggg1111((((wwww1111;;;; θθθθ1111)))) ==== GGGG((((wwww1111;;;;αααα1111,,,, γγγγ1111)))) .... .... .... ggggMMMM ((((wwwwMMMM ;;;; θθθθMMMM )))) ==== GGGG((((wwwwMMMM ;;;;ααααMMMM ,,,, γγγγMMMM ))))
Univariate Pearson type VII distributionUnivariateUnivariate Pearson type VII distributionPearson type VII distribution
Log density and density for different parameters (varying kurtosis ie. sharpness of peak)
Gaussian distribution in black
Multiple DoF Student distributionsMultiple Multiple DoFDoF Student distributionsStudent distributions
x
y
z
x
y
z
x
y
z
x
y
z
Multivariate Student Multiple dof Multivariate Student
SSSSttttuuuuddddeeeennnnttttνννν ==== 0000....1111 ((((ttttoooopppp lllleeeefffftttt)))) aaaannnnddddνννν ==== 5555 ((((bbbboooottttttttoooommmm lllleeeefffftttt))))
SSSSttttuuuuddddeeeennnntttt lllliiiikkkkeeee::::νννν1111 ==== νννν2222 ==== 0000....1111 aaaannnndddd θθθθ ==== ππππ////3333 ((((ttttoooopppp rrrriiiigggghhhhtttt))))νννν1111 ==== 1111,,,, νννν2222 ==== 11110000,,,, θθθθ ==== ππππ////3333 ((((bbbboooottttttttoooommmm rrrriiiigggghhhhtttt))))
Multiple DoF Student distributionsMultiple Multiple DoFDoF Student distributionsStudent distributions
0.005
0.01
0.015
0.02
0.025
0.03
−4 −2 0 2 4 6 8
−4
−2
02
46
8
0.005
0.01
0.015
0.02 0.025
0.03
−4 −2 0 2 4 6 8
−4
−2
02
46
8
0.005
0.01
0.015
0.02
0.025
0.03
−4 −2 0 2 4 6 8
−4
−2
02
46
8
0.005
0.01
0.015
0.02
0.025
−4 −2 0 2 4 6 8
−4
−2
02
46
8
BBBBiiiivvvvaaaarrrriiiiaaaatttteeee SSSSttttuuuuddddeeeennnntttt::::νννν ==== 2222 ((((lllleeeefffftttt)))) aaaannnndddd νννν ==== 11110000 ((((rrrriiiigggghhhhtttt))))
NNNNeeeewwww BBBBiiiivvvvaaaarrrriiiiaaaatttteeee SSSSttttuuuuddddeeeennnntttt lllliiiikkkkeeeeffffoooorrrr νννν1111 ==== 2222,,,, νννν2222 ==== 11110000::::θθθθ ==== 0000 ((((lllleeeefffftttt)))) aaaannnndddd θθθθ ==== ππππ////8888 ((((rrrriiiigggghhhhtttt))))
Multivariate Pearson like distributionsMultivariate Pearson like distributionsMultivariate Pearson like distributions
0.2
0.4
0.6
0.8
1
1.2
1.4
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−1.
5−
1.0
−0.
50.
00.
51.
01.
5
A1 = 0.15, A2 = 1, θ = 0, α1 = 0.2, α1 = 1, β1 = 5, β2 = 10
Application to clustering: mixturesApplication to clustering: mixturesApplication to clustering: mixtures
−15 −10 −5 0 5 10 15
−6
−2
02
46
True classification
x1
0
−15 −10 −5 0 5 10 15
−6
−2
02
46
Estimated (classif.) − nonweighted
x1
0
−15 −10 −5 0 5 10 15
−6
−2
02
46
True classification
x9
x1
0
−15 −10 −5 0 5 10 15
−6
−2
02
46
Estimated (classif.) − weightedx1
0
3 Clusters generated from the product of two univariate Student distributionwith ν = 2 and ν = 30
Standard MoTMoMultipleDoFT: the different tailweights are better modeled
Data item dependent weight priors and spatial class prior
K-component mixture of M-dim. t-distributions weighted model
Data augmentation:
Further generalizationFurther generalizationFurther generalization
{{{{zzzz1111 .... .... .... zzzzNNNN}}}} iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt zzzz MMMMaaaarrrrkkkkoooovvvviiiiaaaannnn
yyyyiiii||||wwwwiiii,,,, zzzziiii ==== eeeekkkk ∼∼∼∼ GGGG((((yyyyiiii;;;;µµµµkkkk,,,,ΣΣΣΣkkkkwwwwiiii
))))
wwwwiiii||||zzzziiii ==== eeeekkkk ∼∼∼∼ ΓΓΓΓ(((( ννννkkkk2222 ,,,,ννννkkkk2222)))) wwwwiiiimmmm ∼∼∼∼ ΓΓΓΓ((((ααααiiiimmmm,,,, γγγγiiiimmmm)))) iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ooooffff kkkk
FFFFoooorrrr aaaa ssssttttaaaannnnddddaaaarrrrdddd mmmmiiiixxxxttttuuuurrrreeee,,,, wwwweeee wwwwoooouuuulllldddd nnnneeeeeeeeddddααααiiiimmmm,,,, γγγγiiiimmmm iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ooooffff iiii====⇒⇒⇒⇒ iiiinnnnaaaapppppppprrrroooopppprrrriiiiaaaatttteeee ffffoooorrrr lllleeeessssiiiioooonnnn ddddeeeetttteeeeccccttttiiiioooonnnn
wwww ==== {{{{wwww1111 .... .... .... wwwwNNNN}}}}wwwwiiiitttthhhh wwwwiiii >>>> 0000 ((((iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ooooffff mmmm))))
wwww ==== {{{{wwww1111 .... .... ....wwwwNNNN}}}} wwwwiiiitttthhhh wwwwiiii ==== {{{{wwwwiiii1111 .... .... .... wwwwiiiiMMMM}}}}
EEEExxxx.... ννννkkkk ==== νννν ∀∀∀∀kkkk====⇒⇒⇒⇒ wwwwiiii iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ooooffff zzzziiii
MOTIVATION for such a generalization
yyyyiiii||||wwwwiiii,,,, zzzziiii ==== eeeekkkk ∼∼∼∼ GGGG((((yyyyiiii;;;;µµµµkkkk,,,, DDDDkkkkWWWW−−−−1111iiii AAAAkkkkDDDD
TTTTkkkk ))))))))
WWWWiiii ==== DDDDiiiiaaaagggg((((wwwwiiii1111............wwwwiiiiMMMM ))))
Modelling lesions: inliers vs outliers ModellingModelling lesions: inliers lesions: inliers vsvs outliers outliers
Explicit modelling usually avoided:
1) Widely varying and inhomogeneous appearance (tumors, stroke)
2) Lesion size can be small (MS lesions)
prevent accurate model parameter estimation
bad lesion delineation
FLAIR DIFFUSIONT2
Modelling lesions: inliers vs outliers ModellingModelling lesions: inliers lesions: inliers vsvs outliers outliers
prevent accurate model parameter estimation
bad lesion delineation
In most approaches: lesion voxels identified as outliers wrt a normal brain model (a priori)
Our approach (incorporation strategy):
•Modify the segmentation model so that lesion voxels become inliers
•Make the estimation of the lesion class possible
•Use an additional weight field
Optimally combine sequences to take into account– a priori (expert) knowledge
– the targeted task
– the type of lesion
Reasons for using weightsReasons for using weightsReasons for using weights
1) To bias the model toward lesion identification: voxel specific weights
• eg. duplicate intensity values typical of the lesion
2) To weight the information content of each sequences: modality specific weights
• Multiple MR volumes are commonly modelled via multivariate Gaussian intensity distributions
• But all the sequences have equal importance
Weight choice? Bayesian framework:
• incorporate a priori on relevant information content of each sequence
• a weighting scheme modified adaptively
Robustness to non Gaussian componentsRobustness to non Gaussian componentsRobustness to non Gaussian components
1. To assess the ability to deal with varying cluster shapes
3 Gaussians and 500 data points from a G(2, 2) to the righty
Dens
ity
0 2 4 6 8 10 12
0.00
0.05
0.10
0.15
0.20
Illustration: non spatial, data point dependent weight, no expert, priors G(1,1)
Weighted G(1,1) vs non weighted approachWeighted G(1,1) Weighted G(1,1) vsvs non weighted approachnon weighted approachTrue classification
0 2 4 6 8 10 12
050
100
150
200
250
Estimated classification − nonweighted scheme
0 2 4 6 8 10 12
050
100
150
200
250
MoG: the 4th component is over fitted, 650 points vs 500 (true)
True classification
0 2 4 6 8 10 12
050
100
150
200
250
Estimated classification − weighted scheme
0 2 4 6 8 10 12
050
100
150
200
250
Better classification: 520 versus 500 (true)
Robustness to shape variabilityRobustness to shape variabilityRobustness to shape variability
0 2 4 6 8 10 12
0.0
0.5
1.0
1.5
y[, 1]
rese
m6$
wei
ghts
[R, ]
The weights adjust to the data allowing slight deviations from a Gaussian distribution
Robustness to outliers (grouped)Robustness to outliers (grouped)Robustness to outliers (grouped)3 bivariate Gaussians (600 points) contaminated with 20 points from a uniform distribution in a parallelepiped
−20 −15 −10 −5 0 5 10
−10
−5
05
1015
20
True classification
x9
−20 −15 −10 −5 0 5 10
−10
−5
05
1015
20
Estimated (classif.) − nonweighted
x9
x10
−20 −15 −10 −5 0 5 10
−10
−5
05
1015
20
Estimated (classif.) − student
x9
x10
−20 −15 −10 −5 0 5 10
−10
−5
05
1015
20
Estimated (classif.) − weighted
x9
x10
Effect of the weights: allow a long tail on one side and a truncated distribution on the other side (green component) => Flexible shape clusters…
Prior: G(1, 1)
πi =1/3, i=1, . . . ,3, µ1=(−9,0), µ2 =(1,5), µ3=(3.5,−3.5)
Σ1 =
(16 00 16
), Σ2 =
(8.5 −7.5−7.5 8.5
), Σ3=
(1 00 1
)
The data item dependent weight model is less sensitive to outliers
• Détermination d’une région d’intéret, (à pondérer):– Les voxels candidats à appartenir à la lesion– Les voxels sélectionnés sont les moins représentatifs du modèle
“sain”, ie. les outliers– Ou sélection par un expert (semi-supervisé) [Graph cut]– Ou utilisation de règles (faux positifs) [STREM]
• Segmentation initiale:– Classe lésion= région d’intéret– Les autres voxels sont segmentés en 3 classes
Variante de EM pour modèle de mélange à K=4 classes avec pondérations avec loi a priori sur les poids
Principe de la méthode d’incorporation
Lesion detection or semi-supervised contextLesion detection or semiLesion detection or semi--supervised contextsupervised context
Supervised context: non spatial illustrationSupervised context: non spatial illustrationSupervised context: non spatial illustration1. To assess the ability of the weighted approach to detect a small non Gaussian component
Data
Den
sity
0 2 4 6 8
0.00
0.05
0.10
0.15
0.20
0.25
3 Gaussian components (5000 data points each)
and
a small Beta(10,2) shifted by 6 units representing 100 data points (proportion=0.066)
The smallest component is “of interest” (eg. lesions)
PPPPrrrroooocccceeeedddduuuurrrreeee:::: cccchhhhoooooooosssseeee LLLL ddddaaaattttaaaa ppppooooiiiinnnnttttssss ffffrrrroooommmm tttthhhheeee ffffoooouuuurrrrtttthhhh ccccoooommmmppppoooonnnneeeennnntttt ((((ssssuuuuppppeeeerrrrvvvviiiisssseeeedddd))))aaaannnndddd uuuusssseeee aaaa GGGG((((αααα,,,, ββββ)))) pppprrrriiiioooorrrr ffffoooorrrr tttthhhheeee ccccoooorrrrrrrreeeessssppppoooonnnnddddiiiinnnngggg wwwweeeeiiiigggghhhhttttssss vvvvaaaarrrriiiiaaaabbbblllleeeessss
Supervised contextSupervised contextSupervised context
NNNNuuuummmmbbbbeeeerrrr ooooffff ppppooooiiiinnnnttttssss ccccllllaaaassssssssiiiififififieeeedddd ttttoooo 4444tttthhhh ccccoooommmmppppoooonnnneeeennnnttttPPPPrrrriiiioooorrrr ppppaaaarrrraaaammmmeeeetttteeeerrrrssss ffffoooorrrr wwww LLLL ==== 11110000 LLLL ==== 55550000 LLLL ==== 111100000000
αααα ==== 1111....0000;;;; ββββ ==== 1111....0000 0000 0000 0000αααα ==== 1111....5555;;;; ββββ ==== 1111....0000 22225555 99990000 111144447777αααα ==== 3333....0000;;;; ββββ ==== 1111....0000 55550000 222222223333
Note: with a Gaussian mixture model (K=4), no points in the 4th component
Supervised contextSupervised contextSupervised context
0 2 4 6 8
0.2
0.4
0.6
0.8
1.0
1.2
1.4
x1
wei
ghts
True classification
x1
Fre
quen
cy
0 2 4 6 8
010
030
050
070
0
Histogram of weights
weights
Fre
quen
cy
0.0 0.5 1.0 1.5
050
015
0025
00
Estimated (classif.) − nonweighted
x1
Fre
quen
cy
0 2 4 6 8
010
030
050
070
0
Clustering results for L = 50 and G(1.5, 1)
100 data points
90 data points
Spatial case: A weighted Hidden Markov model Spatial case: A weighted Hidden Markov model Spatial case: A weighted Hidden Markov model
Spatial dependencies between voxels:the joint distribution is a Markov random field (MRF)
Data driven term, based on intensitiesParameter prior term
� Observations:
Missing data term
28
pppp((((yyyy,,,, zzzz,,,,wwww;;;;ψψψψ)))) ∝∝∝∝ eeeexxxxpppp((((HHHH((((yyyy,,,, zzzz,,,,wwww;;;;ψψψψ))))))))
N voxels (3D) x M modalities (T1, T2, Flair images)
� Labels:
yyyy ==== {{{{yyyy1111 .... .... ....yyyyNNNN}}}} wwwwhhhheeeerrrreeee yyyyiiii ==== {{{{yyyyiiii1111 .... .... .... yyyyiiiiMMMM}}}}
zzzz ==== {{{{zzzz1111 .... .... .... zzzzNNNN}}}} wwwwiiiitttthhhh zzzziiii ∈∈∈∈ {{{{eeee1111 .... .... .... eeeeKKKK}}}} ((((KKKK ttttiiiissssssssuuuueeeessss))))
� Weights:
HHHH((((yyyy,,,, zzzz,,,,wwww;;;;ψψψψ)))) ==== HHHHZZZZ((((zzzz;;;; ββββ)))) ++++HHHHWWWW ((((wwww)))) ++++∑∑∑∑
iiii∈∈∈∈VVVV lllloooogggg gggg((((yyyyiiii||||zzzziiii,,,,wwwwiiii;;;;φφφφ))))
ψψψψ ==== {{{{ββββ,,,, φφφφ}}}}
wwww ==== {{{{wwww1111 .... .... ....wwwwNNNN }}}} wwwwiiiitttthhhh wwwwiiii ==== {{{{wwwwiiii1111 .... .... .... wwwwiiiiMMMM }}}}sssseeeeqqqquuuueeeennnncccceeee aaaannnndddd vvvvooooxxxxeeeellll ssssppppeeeecccciiiifififificccc
Data term:
If all weights are 1, a standard multivariate (diagonal) Gaussian case is recovered
Missing data term:
Parameter prior term: pppp((((wwww)))) ====∏∏∏∏MMMMmmmm====1111 pppp((((wwwwmmmm))))
1111))))∑∑∑∑NNNNiiii====1111 wwwwiiiimmmm ==== NNNN AAAA ddddiiiirrrriiiicccchhhhlllleeeetttt ddddiiiissssttttrrrriiiibbbbuuuuttttiiiioooonnnn ffffoooorrrr pppp((((wwwwmmmm))))
PPPPoooottttttttssss mmmmooooddddeeeellll wwwwiiiitttthhhh eeeexxxxtttteeeerrrrnnnnaaaallll fifififieeeelllldddd ξξξξ,,,, iiiinnnntttteeeerrrraaaaccccttttiiiioooonnnn ppppaaaarrrraaaammmmeeeetttteeeerrrr ηηηη
NNNN ((((iiii)))):::: vvvvooooxxxxeeeellllssss nnnneeeeiiiigggghhhhbbbboooorrrriiiinnnngggg iiiiββββ ==== {{{{ξξξξ,,,, ηηηη}}}} wwwwiiiitttthhhh ξξξξ ==== tttt((((ξξξξ1111,,,, .... .... .... ,,,, ξξξξKKKK)))) aaaannnndddd ηηηη >>>> 0000
wwwwmmmm ==== {{{{wwww1111mmmm .... .... .... wwwwNNNNmmmm}}}}
2222)))) tttthhhheeee wwwwiiiimmmm aaaarrrreeee iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt wwwwiiiimmmm ∼∼∼∼ ΓΓΓΓ((((ααααiiiimmmm,,,, γγγγiiiimmmm))))
HHHHZZZZ ((((zzzz;;;; ββββ)))) ====NNNN∑∑∑∑
iiii====1111
((((<<<< zzzziiii,,,, ξξξξ >>>> ++++∑∑∑∑
jjjj∈∈∈∈NNNN ((((iiii))))
ηηηη <<<< zzzziiii,,,, zzzzjjjj >>>>))))
ααααiiiimmmm ==== γγγγiiiimmmmwwwweeeexxxxppppiiiimmmm ++++ 1111 wwwweeeexxxxppppiiiimmmm iiiissss tttthhhheeee mmmmooooddddeeee ooooffff tttthhhheeee pppprrrriiiioooorrrr ffffoooorrrr wwwwiiiimmmm
gggg((((yyyyiiii||||zzzziiii,,,,wwwwiiii;;;; φφφφ)))) ==== GGGG((((yyyyiiii;;;; µµµµzzzziiii ,,,, DDDDzzzziiiiWWWW−−−−1111iiii AAAAzzzziiiiDDDD
TTTTzzzz1111 ))))
Estimation by variational EMEstimation by Estimation by variationalvariational EMEM
30
An alternating maximization view of EM:[Neal&Hinton98]
FFFF ((((qqqq,,,, ψψψψ)))) ==== EEEEqqqq[[[[lllloooogggg pppp((((yyyy,,,,ZZZZ,,,,WWWW ;;;; ψψψψ))))]]]] ++++ IIII [[[[qqqq]]]]
Variational approximation:
Exact E-step leads to qqqq((((rrrr))))((((zzzz,,,,wwww)))) ==== pppp((((zzzz,,,,wwww||||yyyy;;;;ψψψψ((((rrrr)))))))) iiiinnnnttttrrrraaaaccccttttaaaabbbblllleeee
qqqq((((zzzz,,,,wwww)))) ==== qqqqZZZZ((((zzzz)))) qqqqWWWW ((((wwww))))EM variant (Variational EM):
The E-step is solved over a restricted class of pdfs (that factorize)
The E-step is further approximated by its decomposition in 2 sub-steps (Incremental EM [Neal&Hinton98])
Modified GAM procedures [Byrne&Gunawardana05]
IIII [[[[qqqq]]]] ==== −−−−EEEEqqqq[[[[lllloooogggg qqqq((((zzzz,,,,WWWW ))))]]]] ((((eeeennnnttttrrrrooooppppyyyy ooooffff qqqq))))
qqqq ∈∈∈∈ DDDD aaaa ddddiiiissssttttrrrriiiibbbbuuuuttttiiiioooonnnn oooonnnn ZZZZ ××××WWWWEEEE----sssstttteeeepppp:::: qqqq((((rrrr)))) ==== aaaarrrrggggmmmmaaaaxxxxqqqq∈∈∈∈DDDD
FFFF ((((qqqq,,,, ψψψψ((((rrrr))))))))
MMMM----sssstttteeeepppp:::: ψψψψ((((rrrr++++1111)))) ==== aaaarrrrggggmmmmaaaaxxxxψψψψ∈∈∈∈ΨΨΨΨ
FFFF ((((qqqq((((rrrr)))),,,, ψψψψ))))
Variational E-step
For the weighted Markov model
pppp((((yyyy,,,, zzzz,,,,wwww;;;;ψψψψ)))) iiiissss MMMMaaaarrrrkkkkoooovvvviiiiaaaannnn ====⇒⇒⇒⇒ aaaallllllll ccccoooonnnnddddiiiittttiiiioooonnnnaaaallllssss aaaarrrreeee MMMMaaaarrrrkkkkoooovvvviiiiaaaannnn
pppp((((wwww||||yyyy,,,, zzzz;;;;ψψψψ)))) iiiissss MMMMaaaarrrrkkkkoooovvvviiiiaaaannnn:::: HHHH((((wwww||||yyyy,,,, zzzz;;;;ψψψψ)))) ==== HHHHWWWW ((((wwww)))) ++++∑∑∑∑
iiii∈∈∈∈VVVVlllloooogggg gggg((((yyyyiiii||||zzzziiii,,,,wwwwiiii;;;;φφφφ))))
pppp((((zzzz||||yyyy,,,,wwww;;;;ψψψψ)))) iiiissss MMMMaaaarrrrkkkkoooovvvviiiiaaaannnn:::: HHHH((((zzzz||||yyyy,,,,wwww;;;;ψψψψ)))) ==== HHHHZZZZ((((zzzz;;;; ββββ)))) ++++∑∑∑∑
iiii∈∈∈∈VVVVlllloooogggg gggg((((yyyyiiii||||zzzziiii,,,,wwwwiiii;;;;φφφφ))))
E-Z: q(r)Z = arg max
qZ∈DZ
F (q(r−1)W qZ ;ψ
(r))
E-W: q(r)W = arg max
qW∈DW
F (qW q(r)Z ;ψ(r))
E-Z: q(r)Z ∝ exp
(Eq(r−1)W
[log p(z|y,W;ψ(r)])
E-W: q(r)W ∝ exp
(Eq(r)Z
[log p(w|y,Z;ψ(r))])
In practice (for diagonal covariances)FFFFiiiixxxx ηηηη ((((iiiinnnntttteeeerrrraaaaccccttttiiiioooonnnn ppppaaaarrrraaaammmmeeeetttteeeerrrr)))) aaaannnndddd tttthhhheeee eeeexxxxppppeeeerrrrtttt wwwweeeeiiiigggghhhhttttssss wwwweeeexxxxppppiiiimmmm
((((mmmmooooddddeeeessss ooooffff tttthhhheeee wwwweeeeiiiigggghhhhtttt pppprrrriiiioooorrrrssss)))) aaaannnndddd γγγγiiiimmmm ((((vvvvaaaarrrriiiiaaaannnncccceeeessss ooooffff tttthhhheeee wwwweeeeiiiigggghhhhtttt pppprrrriiiioooorrrrssss))))
CCCCoooommmmppppuuuutttteeee qqqq((((rrrr))))ZZZZ ((((zzzz)))) uuuussssiiiinnnngggg mmmmeeeeaaaannnn----fifififieeeelllldddd aaaapppppppprrrrooooxxxxiiiimmmmaaaattttiiiioooonnnn oooorrrr vvvvaaaarrrriiiiaaaannnnttttssss
CCCCoooommmmppppuuuutttteeee ¯̄̄̄wwww((((rrrr))))iiiimmmm aaaassss
CCCCoooommmmppppuuuutttteeee ((((NNNNeeeewwwwttttoooonnnn)))) ξξξξ ((((eeeexxxxtttteeeerrrrnnnnaaaallll fifififieeeelllldddd ppppaaaarrrraaaammmmeeeetttteeeerrrr)))) uuuussssiiiinnnngggg mmmmeeeeaaaannnn fifififieeeelllldddd
CCCCoooommmmppppuuuutttteeee tttthhhheeee ggggaaaauuuussssssssiiiiaaaannnn mmmmeeeeaaaannnnssss aaaannnndddd vvvvaaaarrrriiiiaaaannnncccceeeessss uuuussssiiiinnnngggg
Iterate:
wwwwiiiitttthhhh ααααiiiimmmm ==== γγγγiiiimmmmwwwweeeexxxxppppiiiimmmm ++++ 1111 δδδδ((((yyyyiiiimmmm,,,, µµµµ
((((rrrr))))kkkkmmmm,,,, ssss
((((rrrr))))kkkkmmmm)))) ====
((((yyyyiiiimmmm−−−−µµµµ((((rrrr))))kkkkmmmm))))2222
ssss((((rrrr))))kkkkmmmm
((((MMMMaaaahhhhaaaallllaaaannnnoooobbbbiiiissss ddddiiiissssttttaaaannnncccceeee))))
µµµµ((((rrrr++++1111))))kkkkmmmm ====
NNNN∑∑∑∑
iiii====1111qqqq((((rrrr))))ZZZZiiii((((eeeekkkk)))) ¯̄̄̄wwww
((((rrrr))))iiiimmmm yyyyiiiimmmm
NNNN∑∑∑∑
iiii====1111qqqq((((rrrr))))ZZZZiiii((((eeeekkkk)))) ¯̄̄̄wwww
((((rrrr))))iiiimmmm
aaaannnndddd ssss((((rrrr++++1111))))kkkkmmmm ====
NNNN∑∑∑∑
iiii====1111qqqq((((rrrr))))ZZZZiiii((((eeeekkkk)))) ¯̄̄̄wwww
((((rrrr))))iiiimmmm ((((yyyyiiiimmmm−−−−µµµµ((((rrrr++++1111))))kkkkmmmm ))))2222
NNNN∑∑∑∑
iiii====1111qqqq((((rrrr))))ZZZZiiii((((eeeekkkk)))) ¯̄̄̄wwww
((((rrrr))))iiiimmmm
(E)
(M)
¯̄̄̄wwww((((rrrr))))iiiimmmm ====
ααααiiiimmmm ++++ 11112222
γγγγiiiimmmm ++++ 11112222
KKKK∑∑∑∑
kkkk====1111
δδδδ((((yyyyiiiimmmm,,,, µµµµ((((rrrr))))kkkkmmmm,,,, ssss
((((rrrr))))kkkkmmmm)))) qqqq
((((rrrr))))ZZZZiiii
((((eeeekkkk))))
Expert knowledge difficult to formalize into weight valuesProposed setting:
Choosing the expert weightsChoosing the expert weightsChoosing the expert weights
wwwweeeexxxxppppiiiimmmm ==== wwwwLLLL >>>> 1111 ∀∀∀∀iiii ∈∈∈∈ LLLL
wwwweeeexxxxppppiiiimmmm ==== 1111 ∀∀∀∀iiii �� ��∈∈∈∈ LLLL
Identify outliers by thresholding (Chi2 percentile) the estimated weights (typicality)
ηηηη ==== 0000 iiiissss ooookkkk
LLLL iiiissss oooobbbbttttaaaaiiiinnnneeeedddd bbbbyyyy aaaappppppppllllyyyyiiiinnnngggg tttthhhheeee mmmmooooddddeeeellllwwwwiiiitttthhhh KKKK ==== 3333,,,, wwwweeeexxxxppppiiiimmmm ==== 1111,,,, γγγγiiiimmmm ==== 1111 ∀∀∀∀iiii,,,, mmmm
Parameters to tune:
ExperimentsExperimentsExperiments
Simulated data (BrainWeb) with MS lesions: T1,T2,PD sequences, 1mm3
Chi2 percentile fixed to 99%wL = 10γim = γ = 10 ∀i,m (prior variances)
Real data sets2 Patients with MS: Flair,T1,T2 sequences, 1mm2x3mm
Ground truth76%
58% Ground truth
Real data sets
1 Patient with stroke: DW , Flair, T2 sequences, 1mm2x5mm
Ground truth63%
• Extension to full covariance matrices: temporal multi-sequence data, eg. patient follow-up
• Other prior for the weights: eg. MRF prior
• Other expert weighting schemes, possibly lesion specific
• Extension to handle intensity inhomogeneities
• Sensitivity analysis: initialization, parameter tuning etc.
• Evaluation in a semi-supervised context
• Add lesion specific information: atlas, rules etc.
Future workFuture workFuture work