Download - Multivariate Robust clustering via mixture models Robust clustering via mixture models INRIA-LJK équipe MISTIS (Grenoble) Florence Forbes , Darren Wraith, SenanDoyle, Eric Frichot

Multivariate Robust clustering via mixture models

INRIA-LJK équipe MISTIS (Grenoble)

Florence Forbes, Darren Wraith, Senan Doyle, Eric Frichot

Model-based multivariate clustering ModelModel--based multivariate clustering based multivariate clustering

Model-based clustering: Mixture of gaussians

• Univariate and multivariate

• Decomposition of the covariance matrix for flexibility in shape, volume and orientation (Banfield and Raftery 93, Celeux and Govaert 95)

ffff ((((yyyyiiii)))) ====KKKK∑∑∑∑

kkkk====1111

ππππkkkkffffkkkk((((yyyyiiii;;;; θθθθkkkk))))

WWWWiiiitttthhhh ffffkkkk((((yyyyiiii;;;; θθθθkkkk)))) ∼∼∼∼ NNNN ((((µµµµkkkk,,,,ΣΣΣΣkkkk))))

ΣΣΣΣkkkk ==== λλλλkkkk DDDDkkkk AAAAkkkkDDDDTTTTkkkk

wwwwiiiitttthhhh λλλλkkkk :::: vvvvoooolllluuuummmmeeeeDDDDkkkk :::: oooorrrriiiieeeennnnttttaaaattttiiiioooonnnnAAAAkkkk :::: sssshhhhaaaappppeeee

Convenient computational tractability:

EM algorithm

+ additional minimization algorithm for some decompositions (see Celeux, Govaert 95)

Robust clustering Robust clustering Robust clustering In some applications:

•The tails of the normal distributions are shorter than appropriate or

•Parameter estimations are affected by atypical observations (outliers)

Fit a mixture of t-distribution (Student distribution)

• Univariate and multivariate

• Additional degree of freedom (dof) parameter

The dof can be seen as a robustness tuning parameter

Convenient computational tractability:

EM algorithm with additional missing variables (class+ weights)

+ additional numerical procedure for the ML estimate of the dof(see MacLachlan and Peel 2000)

In contrast to the Gaussian case, no closed-form solution for ML

ffffkkkk((((yyyyiiii;;;; θθθθkkkk)))) ∼∼∼∼ tttt((((yyyyiiii;;;;µµµµkkkk,,,,ΣΣΣΣkkkk,,,, ννννkkkk))))

νννν →→→→∞∞∞∞ ⇒⇒⇒⇒ ffffkkkk →→→→ NNNN ddddiiiissssttttrrrriiiibbbbuuuuttttiiiioooonnnn

But a useful representation of the t-distribution as an infinite mixture of scaled Gaussians

Scaled mixture of GaussiansScaled mixture of GaussiansScaled mixture of Gaussians

The EM algorithm for t-mixtures is based on the following construction of the t-distribution

tttt((((yyyy µµµµ,,,,ΣΣΣΣ,,,, νννν)))) ====∫∫∫∫∞∞∞∞0000NNNN ((((yyyy;;;;µµµµ,,,,ΣΣΣΣ////wwww)))) GGGG((((wwww;;;; νννν////2222,,,, νννν////2222)))) ddddwwww

Another construction (equivalent):

XXXX ∼∼∼∼ NNNN ((((0000,,,,ΣΣΣΣ)))) aaaannnndddd VVVV ∼∼∼∼ XXXX 2222((((νννν)))) ==== GGGG((((νννν////2222,,,, 1111////2222))))

YYYY ==== XXXX ××××√√√√(((( ννννVVVV )))) ++++ µµµµ ∼∼∼∼ tttt((((µµµµ,,,,ΣΣΣΣ,,,, νννν))))

VVVVνννν∼∼∼∼ GGGG((((νννν////2222,,,, νννν////2222))))

tttt((((yyyy;;;;µµµµ,,,,ΣΣΣΣ,,,, νννν)))) ==== ΓΓΓΓ((((((((MMMM++++νννν))))////2222))))ΓΓΓΓ((((νννν////2222)))) ((((ννννππππ))))MMMM////2222 ||||ΣΣΣΣ||||

−−−−1111////2222 [[[[1111 ++++ δδδδ2222////νννν]]]]−−−−((((νννν++++MMMM))))////2222

wwwwiiiitttthhhh δδδδ2222 ==== ((((yyyy −−−− µµµµ))))TTTTΣΣΣΣ−−−−1111((((yyyy −−−− µµµµ)))) tttthhhheeee ssssqqqquuuuaaaarrrreeeedddd MMMMaaaahhhhaaaallllaaaannnnoooobbbbiiiissss ddddiiiissssttttaaaannnncccceeee

MMMM :::: ddddiiiimmmmeeeennnnssssiiiioooonnnnaaaalllliiiittttyyyy ooooffff yyyyΓΓΓΓ:::: GGGGaaaammmmmmmmaaaa ffffuuuunnnnccccttttiiiioooonnnn

EM algorithm for t-mixtures (MoT)EM algorithm for tEM algorithm for t--mixtures (mixtures (MoTMoT))

� Observations:

5

� Missing data:

yyyy ==== {{{{yyyy1111 .... .... ....yyyyNNNN}}}} wwwwhhhheeeerrrreeee yyyyiiii ==== {{{{yyyyiiii1111 .... .... .... yyyyiiiiMMMM}}}}

� Additional missing data:

� Unknown parameters:

Expectation Maximization (EM) for maximum likelihood ( )

IIIItttteeeerrrraaaattttiiiioooonnnn rrrr EEEE----sssstttteeeepppp:::: ccccoooommmmppppuuuutttteeee pppp((((zzzz,,,,wwww||||yyyy;;;;ψψψψ((((rrrr))))))))

MMMM----sssstttteeeepppp:::: ψψψψ((((rrrr++++1111)))) ==== aaaarrrrggggmmmmaaaaxxxxψψψψ∈∈∈∈ΨΨΨΨ

EEEE[[[[lllloooogggg pppp((((ZZZZ,,,,WWWW,,,,yyyy;;;;ψψψψ))))||||yyyy;;;;ψψψψ((((rrrr))))]]]]

zzzz ==== {{{{zzzz1111 .... .... .... zzzzNNNN}}}} wwwwiiiitttthhhh zzzziiii ∈∈∈∈ {{{{eeee1111 .... .... .... eeeeKKKK}}}} ((((KKKK ccccllllaaaasssssssseeeessss))))

yyyyiiii||||wwwwiiii,,,, zzzziiii ==== eeeekkkk ∼∼∼∼ GGGG((((yyyyiiii;;;;µµµµkkkk,,,,ΣΣΣΣkkkkwwwwiiii

))))

wwww ==== {{{{wwww1111 .... .... ....wwwwNNNN }}}}

wwwwiiii||||zzzziiii ==== eeeekkkk ∼∼∼∼ ΓΓΓΓ(((( ννννkkkk2222 ,,,,ννννkkkk2222))))

ψ

ZZZZiiii ∼∼∼∼MMMM((((ππππ1111 .... .... .... ππππKKKK)))) iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt

ψψψψ ==== {{{{µµµµkkkk,,,,ΣΣΣΣkkkk,,,, ννννkkkk,,,, ππππkkkk}}}}

Iterate:

(E)

(M)

EM algorithm for t-mixtures (MoT)EM algorithm for tEM algorithm for t--mixtures (mixtures (MoTMoT))

CCCCoooommmmppppuuuutttteeee qqqq((((rrrr))))ZZZZiiii

((((eeeekkkk)))) ppppoooosssstttteeeerrrriiiioooorrrr ccccllllaaaassssssss mmmmeeeemmmmbbbbeeeerrrrsssshhhhiiiipppp pppprrrroooobbbbaaaabbbbiiiilllliiiittttiiiieeeessss,,,, ffffoooorrrr aaaallllllll iiii,,,, kkkk

CCCCoooommmmppppuuuutttteeee tttthhhheeee ggggaaaauuuussssssssiiiiaaaannnn mmmmeeeeaaaannnnssss aaaannnndddd vvvvaaaarrrriiiiaaaannnncccceeeessss uuuussssiiiinnnngggg

¯̄̄̄wwww((((rrrr))))iiiikkkk ====

νννν((((rrrr))))kkkk ++++MMMM

νννν((((rrrr))))kkkk ++++ δδδδ((((yyyyiiii,,,, µµµµ

((((rrrr))))kkkk ,,,,ΣΣΣΣ

((((rrrr))))kkkk ))))

CCCCoooommmmppppuuuutttteeee ¯̄̄̄wwww((((rrrr))))iiiikkkk aaaassss

µµµµ((((rrrr++++1111))))kkkk ====

NNNN∑∑∑∑

iiii====1111qqqq((((rrrr))))ZZZZiiii((((eeeekkkk)))) ¯̄̄̄wwww

((((rrrr))))iiiikkkk yyyyiiii

NNNN∑∑∑∑


((((rrrr))))iiiikkkk

aaaannnndddd ΣΣΣΣ((((rrrr++++1111))))kkkk ====

NNNN∑∑∑∑


((((rrrr))))iiiikkkk ((((yyyyiiii−−−−µµµµ((((rrrr++++1111))))kkkk ))))((((yyyyiiii−−−−µµµµ((((rrrr++++1111))))kkkk ))))TTTT

NNNN∑∑∑∑


((((rrrr))))iiiikkkk

CCCCoooommmmppppuuuutttteeee tttthhhheeee ddddooooffff νννν((((rrrr++++1111))))kkkk aaaassss aaaa ssssoooolllluuuuttttiiiioooonnnn ooooffff aaaannnn eeeeqqqquuuuaaaattttiiiioooonnnn

IllustrationsIllustrationsIllustrations

MacLachlan, Ng, Bean 2006

Robust Bayesian clusteringRobust Bayesian clusteringRobust Bayesian clustering

Student mixtures + priors on parameters : Bayesian Student mixtures

Inference by Variational Bayes EM

Advantage: automatic and robust model selection

References:

Svensen and Bishop: no prior on the dof and variational approximation qz qw qtheta

Archambeau and Verleysen: uniform prior on dof and “better” approximation qzw qtheta

Takekawa et Fukai: improved on Archambeau and Verleysen with exponential prior on dof

A lot of robust approaches to clustering via mixture models has been based on mixture of Student distribution

Goal: explore the scale mixture framework further

Multivariate heavy tail distributionsMultivariate heavy tail distributionsMultivariate heavy tail distributions

Many ways to generalize the univariate Student distribution (in the Student spirit):

• The standard way has one particular disadvantage as a model for data: all its marginalsare Student but have the same dof and hence the same amount of tailweight.

• Product of independent t-distributions: varying dof but no correlation

• Jones 2002: a dependent bivariate t distribution with marginals of different dof. Extension to multivariate? Joint density not tractable?

• Eltoft et al. 2006: new multivariate scale mixture of Gaussians, more general than Student

X ∼ N (0,Σ) and V ∼ X 2(ν)

Y = X ×√( νV ) + µ ∼ t(µ,Σ, ν)

X ∼ N (0, IdM) and Λ pos.def M ×M with |Λ| = 1,Z a scalar positive variable with pdf to be chosen (eg Γ or X )Y = µ+ Λ1/2 X√

Z

t(y;µ,Σ, ν) = Γ((M+ν)/2)Γ(ν/2) (νπ)M/2 |Σ|

−1/2 [1 + δ2/ν]−(ν+M)/2

with δ2 = (y − µ)TΣ−1(y − µ)

New multivariate heavy tail distributionsNew multivariate heavy tail distributionsNew multivariate heavy tail distributions

Several equivalent constructions:

1. Gaussian scale mixtures

Student like:

Pearson Type VII like:

WWWW====ddddiiiiaaaagggg((((wwww1111,,,,............wwwwMMMM)))) Σ = DADT

2. Generative construction (useful for simulation)

ffff((((yyyy;;;;µµµµ,,,,ΣΣΣΣ,,,,θθθθ1111............θθθθMMMM))))====∞∞∞∞∫∫∫∫0000

∞∞∞∞∫∫∫∫0000NNNN((((yyyy;;;;µµµµ,,,,DDDDWWWW−−−−1111AAAADDDDTTTT)))) gggg1111((((wwww1111;;;;θθθθ1111))))............ggggMMMM((((wwwwMMMM;;;;θθθθMMMM)))) ddddwwww1111............ddddwwwwMMMM

gggg1111((((wwww1111;;;;θθθθ1111))))====GGGG((((wwww1111;;;;νννν1111////2222,,,,νννν1111////2222))))............ggggMMMM((((wwwwMMMM;;;;θθθθMMMM))))====GGGG((((wwwwMMMM;;;;ννννMMMM////2222,,,,ννννMMMM////2222))))

XXXX∼∼∼∼NNNN((((0000,,,, IIIIddddMMMM))))aaaannnndddd ffffoooorrrrmmmm====1111............MMMM,,,, ZZZZmmmm∼∼∼∼ ggggmmmm((((zzzz;;;;θθθθmmmm))))aaaallllllll iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ((((ppppoooossssiiiittttiiiivvvveeee vvvvaaaarrrriiiiaaaabbbblllleeeessss))))

˜̃̃̃XXXX====(((( XXXX1111√√√√ZZZZ1111,,,, ............ XXXXMMMM√√√√

ZZZZMMMM))))TTTT

TTTThhhheeeennnn YYYY ====µµµµ++++ΣΣΣΣ−−−−1111////2222 ˜̃̃̃XXXX

XXXX ∼∼∼∼NNNN((((0000,,,,AAAA))))aaaannnndddd ffffoooorrrr mmmm==== 1111............MMMM,,,, ZZZZmmmm ∼∼∼∼ ggggmmmm((((zzzz;;;; θθθθmmmm))))aaaallllllll iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ((((ppppoooossssiiiittttiiiivvvveeee vvvvaaaarrrriiiiaaaabbbblllleeeessss))))

˜̃̃̃XXXX ==== (((( XXXX1111√√√√ZZZZ1111,,,, ............ XXXXMMMM√√√√

ZZZZMMMM))))TTTT

TTTThhhheeeennnn YYYY ==== µµµµ++++DDDD ˜̃̃̃XXXX

gggg1111((((wwww1111;;;; θθθθ1111)))) ==== GGGG((((wwww1111;;;;αααα1111,,,, γγγγ1111)))) .... .... .... ggggMMMM ((((wwwwMMMM ;;;; θθθθMMMM )))) ==== GGGG((((wwwwMMMM ;;;;ααααMMMM ,,,, γγγγMMMM ))))

Univariate Pearson type VII distributionUnivariateUnivariate Pearson type VII distributionPearson type VII distribution

Log density and density for different parameters (varying kurtosis ie. sharpness of peak)

Gaussian distribution in black

Multiple DoF Student distributionsMultiple Multiple DoFDoF Student distributionsStudent distributions

x

y

z

x

y

z

x

y

z

x

y

z

Multivariate Student Multiple dof Multivariate Student

SSSSttttuuuuddddeeeennnnttttνννν ==== 0000....1111 ((((ttttoooopppp lllleeeefffftttt)))) aaaannnnddddνννν ==== 5555 ((((bbbboooottttttttoooommmm lllleeeefffftttt))))

SSSSttttuuuuddddeeeennnntttt lllliiiikkkkeeee::::νννν1111 ==== νννν2222 ==== 0000....1111 aaaannnndddd θθθθ ==== ππππ////3333 ((((ttttoooopppp rrrriiiigggghhhhtttt))))νννν1111 ==== 1111,,,, νννν2222 ==== 11110000,,,, θθθθ ==== ππππ////3333 ((((bbbboooottttttttoooommmm rrrriiiigggghhhhtttt))))

Multiple DoF Student distributionsMultiple Multiple DoFDoF Student distributionsStudent distributions

0.005

0.01

0.015

0.02

0.025

0.03

−4 −2 0 2 4 6 8

−4

−2

02

46

8

0.005

0.01

0.015

0.02 0.025

0.03

−4 −2 0 2 4 6 8

−4

−2

02

46

8

0.005

0.01

0.015

0.02

0.025

0.03

−4 −2 0 2 4 6 8

−4

−2

02

46

8

0.005

0.01

0.015

0.02

0.025

−4 −2 0 2 4 6 8

−4

−2

02

46

8

BBBBiiiivvvvaaaarrrriiiiaaaatttteeee SSSSttttuuuuddddeeeennnntttt::::νννν ==== 2222 ((((lllleeeefffftttt)))) aaaannnndddd νννν ==== 11110000 ((((rrrriiiigggghhhhtttt))))

NNNNeeeewwww BBBBiiiivvvvaaaarrrriiiiaaaatttteeee SSSSttttuuuuddddeeeennnntttt lllliiiikkkkeeeeffffoooorrrr νννν1111 ==== 2222,,,, νννν2222 ==== 11110000::::θθθθ ==== 0000 ((((lllleeeefffftttt)))) aaaannnndddd θθθθ ==== ππππ////8888 ((((rrrriiiigggghhhhtttt))))

Multivariate Pearson like distributionsMultivariate Pearson like distributionsMultivariate Pearson like distributions

0.2

0.4

0.6

0.8

1

1.2

1.4

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

A1 = 0.15, A2 = 1, θ = 0, α1 = 0.2, α1 = 1, β1 = 5, β2 = 10

Application to clustering: mixturesApplication to clustering: mixturesApplication to clustering: mixtures

−15 −10 −5 0 5 10 15

−6

−2

02

46

True classification

x1

0

−15 −10 −5 0 5 10 15

−6

−2

02

46

Estimated (classif.) − nonweighted

x1

0

−15 −10 −5 0 5 10 15

−6

−2

02

46

True classification

x9

x1

0

−15 −10 −5 0 5 10 15

−6

−2

02

46

Estimated (classif.) − weightedx1

0

3 Clusters generated from the product of two univariate Student distributionwith ν = 2 and ν = 30

Standard MoTMoMultipleDoFT: the different tailweights are better modeled

Data item dependent weight priors and spatial class prior

K-component mixture of M-dim. t-distributions weighted model

Data augmentation:

Further generalizationFurther generalizationFurther generalization

{{{{zzzz1111 .... .... .... zzzzNNNN}}}} iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt zzzz MMMMaaaarrrrkkkkoooovvvviiiiaaaannnn

yyyyiiii||||wwwwiiii,,,, zzzziiii ==== eeeekkkk ∼∼∼∼ GGGG((((yyyyiiii;;;;µµµµkkkk,,,,ΣΣΣΣkkkkwwwwiiii

))))

wwwwiiii||||zzzziiii ==== eeeekkkk ∼∼∼∼ ΓΓΓΓ(((( ννννkkkk2222 ,,,,ννννkkkk2222)))) wwwwiiiimmmm ∼∼∼∼ ΓΓΓΓ((((ααααiiiimmmm,,,, γγγγiiiimmmm)))) iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ooooffff kkkk

FFFFoooorrrr aaaa ssssttttaaaannnnddddaaaarrrrdddd mmmmiiiixxxxttttuuuurrrreeee,,,, wwwweeee wwwwoooouuuulllldddd nnnneeeeeeeeddddααααiiiimmmm,,,, γγγγiiiimmmm iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ooooffff iiii====⇒⇒⇒⇒ iiiinnnnaaaapppppppprrrroooopppprrrriiiiaaaatttteeee ffffoooorrrr lllleeeessssiiiioooonnnn ddddeeeetttteeeeccccttttiiiioooonnnn

wwww ==== {{{{wwww1111 .... .... .... wwwwNNNN}}}}wwwwiiiitttthhhh wwwwiiii >>>> 0000 ((((iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ooooffff mmmm))))

wwww ==== {{{{wwww1111 .... .... ....wwwwNNNN}}}} wwwwiiiitttthhhh wwwwiiii ==== {{{{wwwwiiii1111 .... .... .... wwwwiiiiMMMM}}}}

EEEExxxx.... ννννkkkk ==== νννν ∀∀∀∀kkkk====⇒⇒⇒⇒ wwwwiiii iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt ooooffff zzzziiii

MOTIVATION for such a generalization

yyyyiiii||||wwwwiiii,,,, zzzziiii ==== eeeekkkk ∼∼∼∼ GGGG((((yyyyiiii;;;;µµµµkkkk,,,, DDDDkkkkWWWW−−−−1111iiii AAAAkkkkDDDD

TTTTkkkk ))))))))

WWWWiiii ==== DDDDiiiiaaaagggg((((wwwwiiii1111............wwwwiiiiMMMM ))))

Modelling lesions: inliers vs outliers ModellingModelling lesions: inliers lesions: inliers vsvs outliers outliers

Explicit modelling usually avoided:

1) Widely varying and inhomogeneous appearance (tumors, stroke)

2) Lesion size can be small (MS lesions)

prevent accurate model parameter estimation

bad lesion delineation

FLAIR DIFFUSIONT2

Modelling lesions: inliers vs outliers ModellingModelling lesions: inliers lesions: inliers vsvs outliers outliers

prevent accurate model parameter estimation

bad lesion delineation

In most approaches: lesion voxels identified as outliers wrt a normal brain model (a priori)

Our approach (incorporation strategy):

•Modify the segmentation model so that lesion voxels become inliers

•Make the estimation of the lesion class possible

•Use an additional weight field

Optimally combine sequences to take into account– a priori (expert) knowledge

– the targeted task

– the type of lesion

Reasons for using weightsReasons for using weightsReasons for using weights

1) To bias the model toward lesion identification: voxel specific weights

• eg. duplicate intensity values typical of the lesion

2) To weight the information content of each sequences: modality specific weights

• Multiple MR volumes are commonly modelled via multivariate Gaussian intensity distributions

• But all the sequences have equal importance

Weight choice? Bayesian framework:

• incorporate a priori on relevant information content of each sequence

• a weighting scheme modified adaptively

Robustness to non Gaussian componentsRobustness to non Gaussian componentsRobustness to non Gaussian components

1. To assess the ability to deal with varying cluster shapes

3 Gaussians and 500 data points from a G(2, 2) to the righty

Dens

ity

0 2 4 6 8 10 12

0.00

0.05

0.10

0.15

0.20

Illustration: non spatial, data point dependent weight, no expert, priors G(1,1)

Weighted G(1,1) vs non weighted approachWeighted G(1,1) Weighted G(1,1) vsvs non weighted approachnon weighted approachTrue classification

0 2 4 6 8 10 12

050

100

150

200

250

Estimated classification − nonweighted scheme

0 2 4 6 8 10 12

050

100

150

200

250

MoG: the 4th component is over fitted, 650 points vs 500 (true)

True classification

0 2 4 6 8 10 12

050

100

150

200

250

Estimated classification − weighted scheme

0 2 4 6 8 10 12

050

100

150

200

250

Better classification: 520 versus 500 (true)

Robustness to shape variabilityRobustness to shape variabilityRobustness to shape variability

0 2 4 6 8 10 12

0.0

0.5

1.0

1.5

y[, 1]

rese

m6$

wei

ghts

[R, ]

The weights adjust to the data allowing slight deviations from a Gaussian distribution

Robustness to outliers (grouped)Robustness to outliers (grouped)Robustness to outliers (grouped)3 bivariate Gaussians (600 points) contaminated with 20 points from a uniform distribution in a parallelepiped

−20 −15 −10 −5 0 5 10

−10

−5

05

1015

20

True classification

x9

−20 −15 −10 −5 0 5 10

−10

−5

05

1015

20


x9

x10

−20 −15 −10 −5 0 5 10

−10

−5

05

1015

20

Estimated (classif.) − student

x9

x10

−20 −15 −10 −5 0 5 10

−10

−5

05

1015

20

Estimated (classif.) − weighted

x9

x10

Effect of the weights: allow a long tail on one side and a truncated distribution on the other side (green component) => Flexible shape clusters…

Prior: G(1, 1)

πi =1/3, i=1, . . . ,3, µ1=(−9,0), µ2 =(1,5), µ3=(3.5,−3.5)

Σ1 =

(16 00 16

), Σ2 =

(8.5 −7.5−7.5 8.5

), Σ3=

(1 00 1

)

The data item dependent weight model is less sensitive to outliers

• Détermination d’une région d’intéret, (à pondérer):– Les voxels candidats à appartenir à la lesion– Les voxels sélectionnés sont les moins représentatifs du modèle

“sain”, ie. les outliers– Ou sélection par un expert (semi-supervisé) [Graph cut]– Ou utilisation de règles (faux positifs) [STREM]

• Segmentation initiale:– Classe lésion= région d’intéret– Les autres voxels sont segmentés en 3 classes

Variante de EM pour modèle de mélange à K=4 classes avec pondérations avec loi a priori sur les poids

Principe de la méthode d’incorporation

Lesion detection or semi-supervised contextLesion detection or semiLesion detection or semi--supervised contextsupervised context

Supervised context: non spatial illustrationSupervised context: non spatial illustrationSupervised context: non spatial illustration1. To assess the ability of the weighted approach to detect a small non Gaussian component

Data

Den

sity

0 2 4 6 8

0.00

0.05

0.10

0.15

0.20

0.25

3 Gaussian components (5000 data points each)

and

a small Beta(10,2) shifted by 6 units representing 100 data points (proportion=0.066)

The smallest component is “of interest” (eg. lesions)

PPPPrrrroooocccceeeedddduuuurrrreeee:::: cccchhhhoooooooosssseeee LLLL ddddaaaattttaaaa ppppooooiiiinnnnttttssss ffffrrrroooommmm tttthhhheeee ffffoooouuuurrrrtttthhhh ccccoooommmmppppoooonnnneeeennnntttt ((((ssssuuuuppppeeeerrrrvvvviiiisssseeeedddd))))aaaannnndddd uuuusssseeee aaaa GGGG((((αααα,,,, ββββ)))) pppprrrriiiioooorrrr ffffoooorrrr tttthhhheeee ccccoooorrrrrrrreeeessssppppoooonnnnddddiiiinnnngggg wwwweeeeiiiigggghhhhttttssss vvvvaaaarrrriiiiaaaabbbblllleeeessss

Supervised contextSupervised contextSupervised context

NNNNuuuummmmbbbbeeeerrrr ooooffff ppppooooiiiinnnnttttssss ccccllllaaaassssssssiiiififififieeeedddd ttttoooo 4444tttthhhh ccccoooommmmppppoooonnnneeeennnnttttPPPPrrrriiiioooorrrr ppppaaaarrrraaaammmmeeeetttteeeerrrrssss ffffoooorrrr wwww LLLL ==== 11110000 LLLL ==== 55550000 LLLL ==== 111100000000

αααα ==== 1111....0000;;;; ββββ ==== 1111....0000 0000 0000 0000αααα ==== 1111....5555;;;; ββββ ==== 1111....0000 22225555 99990000 111144447777αααα ==== 3333....0000;;;; ββββ ==== 1111....0000 55550000 222222223333

Note: with a Gaussian mixture model (K=4), no points in the 4th component

Supervised contextSupervised contextSupervised context

0 2 4 6 8

0.2

0.4

0.6

0.8

1.0

1.2

1.4

x1

wei

ghts

True classification

x1

Fre

quen

cy

0 2 4 6 8

010

030

050

070

0

Histogram of weights

weights

Fre

quen

cy

0.0 0.5 1.0 1.5

050

015

0025

00


x1

Fre

quen

cy

0 2 4 6 8

010

030

050

070

0

Clustering results for L = 50 and G(1.5, 1)

100 data points

90 data points

Spatial case: A weighted Hidden Markov model Spatial case: A weighted Hidden Markov model Spatial case: A weighted Hidden Markov model

Spatial dependencies between voxels:the joint distribution is a Markov random field (MRF)

Data driven term, based on intensitiesParameter prior term

� Observations:

Missing data term

28

pppp((((yyyy,,,, zzzz,,,,wwww;;;;ψψψψ)))) ∝∝∝∝ eeeexxxxpppp((((HHHH((((yyyy,,,, zzzz,,,,wwww;;;;ψψψψ))))))))

N voxels (3D) x M modalities (T1, T2, Flair images)

� Labels:

yyyy ==== {{{{yyyy1111 .... .... ....yyyyNNNN}}}} wwwwhhhheeeerrrreeee yyyyiiii ==== {{{{yyyyiiii1111 .... .... .... yyyyiiiiMMMM}}}}

zzzz ==== {{{{zzzz1111 .... .... .... zzzzNNNN}}}} wwwwiiiitttthhhh zzzziiii ∈∈∈∈ {{{{eeee1111 .... .... .... eeeeKKKK}}}} ((((KKKK ttttiiiissssssssuuuueeeessss))))

� Weights:

HHHH((((yyyy,,,, zzzz,,,,wwww;;;;ψψψψ)))) ==== HHHHZZZZ((((zzzz;;;; ββββ)))) ++++HHHHWWWW ((((wwww)))) ++++∑∑∑∑

iiii∈∈∈∈VVVV lllloooogggg gggg((((yyyyiiii||||zzzziiii,,,,wwwwiiii;;;;φφφφ))))

ψψψψ ==== {{{{ββββ,,,, φφφφ}}}}

wwww ==== {{{{wwww1111 .... .... ....wwwwNNNN }}}} wwwwiiiitttthhhh wwwwiiii ==== {{{{wwwwiiii1111 .... .... .... wwwwiiiiMMMM }}}}sssseeeeqqqquuuueeeennnncccceeee aaaannnndddd vvvvooooxxxxeeeellll ssssppppeeeecccciiiifififificccc

Data term:

If all weights are 1, a standard multivariate (diagonal) Gaussian case is recovered

Missing data term:

Parameter prior term: pppp((((wwww)))) ====∏∏∏∏MMMMmmmm====1111 pppp((((wwwwmmmm))))

1111))))∑∑∑∑NNNNiiii====1111 wwwwiiiimmmm ==== NNNN AAAA ddddiiiirrrriiiicccchhhhlllleeeetttt ddddiiiissssttttrrrriiiibbbbuuuuttttiiiioooonnnn ffffoooorrrr pppp((((wwwwmmmm))))

PPPPoooottttttttssss mmmmooooddddeeeellll wwwwiiiitttthhhh eeeexxxxtttteeeerrrrnnnnaaaallll fifififieeeelllldddd ξξξξ,,,, iiiinnnntttteeeerrrraaaaccccttttiiiioooonnnn ppppaaaarrrraaaammmmeeeetttteeeerrrr ηηηη

NNNN ((((iiii)))):::: vvvvooooxxxxeeeellllssss nnnneeeeiiiigggghhhhbbbboooorrrriiiinnnngggg iiiiββββ ==== {{{{ξξξξ,,,, ηηηη}}}} wwwwiiiitttthhhh ξξξξ ==== tttt((((ξξξξ1111,,,, .... .... .... ,,,, ξξξξKKKK)))) aaaannnndddd ηηηη >>>> 0000

wwwwmmmm ==== {{{{wwww1111mmmm .... .... .... wwwwNNNNmmmm}}}}

2222)))) tttthhhheeee wwwwiiiimmmm aaaarrrreeee iiiinnnnddddeeeeppppeeeennnnddddeeeennnntttt wwwwiiiimmmm ∼∼∼∼ ΓΓΓΓ((((ααααiiiimmmm,,,, γγγγiiiimmmm))))

HHHHZZZZ ((((zzzz;;;; ββββ)))) ====NNNN∑∑∑∑

iiii====1111

((((<<<< zzzziiii,,,, ξξξξ >>>> ++++∑∑∑∑

jjjj∈∈∈∈NNNN ((((iiii))))

ηηηη <<<< zzzziiii,,,, zzzzjjjj >>>>))))

ααααiiiimmmm ==== γγγγiiiimmmmwwwweeeexxxxppppiiiimmmm ++++ 1111 wwwweeeexxxxppppiiiimmmm iiiissss tttthhhheeee mmmmooooddddeeee ooooffff tttthhhheeee pppprrrriiiioooorrrr ffffoooorrrr wwwwiiiimmmm

gggg((((yyyyiiii||||zzzziiii,,,,wwwwiiii;;;; φφφφ)))) ==== GGGG((((yyyyiiii;;;; µµµµzzzziiii ,,,, DDDDzzzziiiiWWWW−−−−1111iiii AAAAzzzziiiiDDDD

TTTTzzzz1111 ))))

Estimation by variational EMEstimation by Estimation by variationalvariational EMEM

30

An alternating maximization view of EM:[Neal&Hinton98]

FFFF ((((qqqq,,,, ψψψψ)))) ==== EEEEqqqq[[[[lllloooogggg pppp((((yyyy,,,,ZZZZ,,,,WWWW ;;;; ψψψψ))))]]]] ++++ IIII [[[[qqqq]]]]

Variational approximation:

Exact E-step leads to qqqq((((rrrr))))((((zzzz,,,,wwww)))) ==== pppp((((zzzz,,,,wwww||||yyyy;;;;ψψψψ((((rrrr)))))))) iiiinnnnttttrrrraaaaccccttttaaaabbbblllleeee

qqqq((((zzzz,,,,wwww)))) ==== qqqqZZZZ((((zzzz)))) qqqqWWWW ((((wwww))))EM variant (Variational EM):

The E-step is solved over a restricted class of pdfs (that factorize)

The E-step is further approximated by its decomposition in 2 sub-steps (Incremental EM [Neal&Hinton98])

Modified GAM procedures [Byrne&Gunawardana05]

IIII [[[[qqqq]]]] ==== −−−−EEEEqqqq[[[[lllloooogggg qqqq((((zzzz,,,,WWWW ))))]]]] ((((eeeennnnttttrrrrooooppppyyyy ooooffff qqqq))))

qqqq ∈∈∈∈ DDDD aaaa ddddiiiissssttttrrrriiiibbbbuuuuttttiiiioooonnnn oooonnnn ZZZZ ××××WWWWEEEE----sssstttteeeepppp:::: qqqq((((rrrr)))) ==== aaaarrrrggggmmmmaaaaxxxxqqqq∈∈∈∈DDDD

FFFF ((((qqqq,,,, ψψψψ((((rrrr))))))))

MMMM----sssstttteeeepppp:::: ψψψψ((((rrrr++++1111)))) ==== aaaarrrrggggmmmmaaaaxxxxψψψψ∈∈∈∈ΨΨΨΨ

FFFF ((((qqqq((((rrrr)))),,,, ψψψψ))))

Variational E-step

For the weighted Markov model

pppp((((yyyy,,,, zzzz,,,,wwww;;;;ψψψψ)))) iiiissss MMMMaaaarrrrkkkkoooovvvviiiiaaaannnn ====⇒⇒⇒⇒ aaaallllllll ccccoooonnnnddddiiiittttiiiioooonnnnaaaallllssss aaaarrrreeee MMMMaaaarrrrkkkkoooovvvviiiiaaaannnn

pppp((((wwww||||yyyy,,,, zzzz;;;;ψψψψ)))) iiiissss MMMMaaaarrrrkkkkoooovvvviiiiaaaannnn:::: HHHH((((wwww||||yyyy,,,, zzzz;;;;ψψψψ)))) ==== HHHHWWWW ((((wwww)))) ++++∑∑∑∑

iiii∈∈∈∈VVVVlllloooogggg gggg((((yyyyiiii||||zzzziiii,,,,wwwwiiii;;;;φφφφ))))

pppp((((zzzz||||yyyy,,,,wwww;;;;ψψψψ)))) iiiissss MMMMaaaarrrrkkkkoooovvvviiiiaaaannnn:::: HHHH((((zzzz||||yyyy,,,,wwww;;;;ψψψψ)))) ==== HHHHZZZZ((((zzzz;;;; ββββ)))) ++++∑∑∑∑

iiii∈∈∈∈VVVVlllloooogggg gggg((((yyyyiiii||||zzzziiii,,,,wwwwiiii;;;;φφφφ))))

E-Z: q(r)Z = arg max

qZ∈DZ

F (q(r−1)W qZ ;ψ

(r))

E-W: q(r)W = arg max

qW∈DW

F (qW q(r)Z ;ψ(r))

E-Z: q(r)Z ∝ exp

(Eq(r−1)W

[log p(z|y,W;ψ(r)])

E-W: q(r)W ∝ exp

(Eq(r)Z

[log p(w|y,Z;ψ(r))])

In practice (for diagonal covariances)FFFFiiiixxxx ηηηη ((((iiiinnnntttteeeerrrraaaaccccttttiiiioooonnnn ppppaaaarrrraaaammmmeeeetttteeeerrrr)))) aaaannnndddd tttthhhheeee eeeexxxxppppeeeerrrrtttt wwwweeeeiiiigggghhhhttttssss wwwweeeexxxxppppiiiimmmm

((((mmmmooooddddeeeessss ooooffff tttthhhheeee wwwweeeeiiiigggghhhhtttt pppprrrriiiioooorrrrssss)))) aaaannnndddd γγγγiiiimmmm ((((vvvvaaaarrrriiiiaaaannnncccceeeessss ooooffff tttthhhheeee wwwweeeeiiiigggghhhhtttt pppprrrriiiioooorrrrssss))))

CCCCoooommmmppppuuuutttteeee qqqq((((rrrr))))ZZZZ ((((zzzz)))) uuuussssiiiinnnngggg mmmmeeeeaaaannnn----fifififieeeelllldddd aaaapppppppprrrrooooxxxxiiiimmmmaaaattttiiiioooonnnn oooorrrr vvvvaaaarrrriiiiaaaannnnttttssss

CCCCoooommmmppppuuuutttteeee ¯̄̄̄wwww((((rrrr))))iiiimmmm aaaassss

CCCCoooommmmppppuuuutttteeee ((((NNNNeeeewwwwttttoooonnnn)))) ξξξξ ((((eeeexxxxtttteeeerrrrnnnnaaaallll fifififieeeelllldddd ppppaaaarrrraaaammmmeeeetttteeeerrrr)))) uuuussssiiiinnnngggg mmmmeeeeaaaannnn fifififieeeelllldddd

CCCCoooommmmppppuuuutttteeee tttthhhheeee ggggaaaauuuussssssssiiiiaaaannnn mmmmeeeeaaaannnnssss aaaannnndddd vvvvaaaarrrriiiiaaaannnncccceeeessss uuuussssiiiinnnngggg

Iterate:

wwwwiiiitttthhhh ααααiiiimmmm ==== γγγγiiiimmmmwwwweeeexxxxppppiiiimmmm ++++ 1111 δδδδ((((yyyyiiiimmmm,,,, µµµµ

((((rrrr))))kkkkmmmm,,,, ssss

((((rrrr))))kkkkmmmm)))) ====

((((yyyyiiiimmmm−−−−µµµµ((((rrrr))))kkkkmmmm))))2222

ssss((((rrrr))))kkkkmmmm

((((MMMMaaaahhhhaaaallllaaaannnnoooobbbbiiiissss ddddiiiissssttttaaaannnncccceeee))))

µµµµ((((rrrr++++1111))))kkkkmmmm ====

NNNN∑∑∑∑


((((rrrr))))iiiimmmm yyyyiiiimmmm

NNNN∑∑∑∑


((((rrrr))))iiiimmmm

aaaannnndddd ssss((((rrrr++++1111))))kkkkmmmm ====

NNNN∑∑∑∑


((((rrrr))))iiiimmmm ((((yyyyiiiimmmm−−−−µµµµ((((rrrr++++1111))))kkkkmmmm ))))2222

NNNN∑∑∑∑


((((rrrr))))iiiimmmm

(E)

(M)

¯̄̄̄wwww((((rrrr))))iiiimmmm ====

ααααiiiimmmm ++++ 11112222

γγγγiiiimmmm ++++ 11112222

KKKK∑∑∑∑

kkkk====1111

δδδδ((((yyyyiiiimmmm,,,, µµµµ((((rrrr))))kkkkmmmm,,,, ssss

((((rrrr))))kkkkmmmm)))) qqqq

((((rrrr))))ZZZZiiii

((((eeeekkkk))))

Expert knowledge difficult to formalize into weight valuesProposed setting:

Choosing the expert weightsChoosing the expert weightsChoosing the expert weights

wwwweeeexxxxppppiiiimmmm ==== wwwwLLLL >>>> 1111 ∀∀∀∀iiii ∈∈∈∈ LLLL

wwwweeeexxxxppppiiiimmmm ==== 1111 ∀∀∀∀iiii �� ∈∈∈∈ LLLL

Identify outliers by thresholding (Chi2 percentile) the estimated weights (typicality)

ηηηη ==== 0000 iiiissss ooookkkk

LLLL iiiissss oooobbbbttttaaaaiiiinnnneeeedddd bbbbyyyy aaaappppppppllllyyyyiiiinnnngggg tttthhhheeee mmmmooooddddeeeellllwwwwiiiitttthhhh KKKK ==== 3333,,,, wwwweeeexxxxppppiiiimmmm ==== 1111,,,, γγγγiiiimmmm ==== 1111 ∀∀∀∀iiii,,,, mmmm

Parameters to tune:

ExperimentsExperimentsExperiments

Simulated data (BrainWeb) with MS lesions: T1,T2,PD sequences, 1mm3

Chi2 percentile fixed to 99%wL = 10γim = γ = 10 ∀i,m (prior variances)

Real data sets2 Patients with MS: Flair,T1,T2 sequences, 1mm2x3mm

Ground truth76%

58% Ground truth

Real data sets

1 Patient with stroke: DW , Flair, T2 sequences, 1mm2x5mm

Ground truth63%

• Extension to full covariance matrices: temporal multi-sequence data, eg. patient follow-up

• Other prior for the weights: eg. MRF prior

• Other expert weighting schemes, possibly lesion specific

• Extension to handle intensity inhomogeneities

• Sensitivity analysis: initialization, parameter tuning etc.

• Evaluation in a semi-supervised context

• Add lesion specific information: atlas, rules etc.

Future workFuture workFuture work

Download - Multivariate Robust clustering via mixture models Robust clustering via mixture models INRIA-LJK équipe MISTIS (Grenoble) Florence Forbes , Darren Wraith, SenanDoyle, Eric Frichot

Top Related