by nizar bouguila and djemel ziou
DESCRIPTION
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length. by Nizar Bouguila and Djemel Ziou. Dissusion led by Qi An Duke University Machine Learning Group. Outline. Introduction The generalized Dirichlet mixture - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/1.jpg)
High-Dimensional Unsupervised Selection and Estimation of a Finite Generali
zed Dirichlet Mixture model Based on Minimum Message Length
by Nizar Bouguila and Djemel Ziou
Dissusion led by Qi An
Duke University Machine Learning Group
![Page 2: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/2.jpg)
Outline
• Introduction
• The generalized Dirichlet mixture
• The minimal message length (MML) criterion
• Fisher information matrix and priors
• Density estimation and model selection
• Experimental results
• Conclusions
![Page 3: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/3.jpg)
Introduction
• How to determine the number of components in a mixture model for high-dimensional data?– Stochastic and resampling (Slow)
• Implementation of model selection criteria• Fully Bayesian way
– Deterministic (Fast)• Approximate Bayesian criteria• Information/coding theory concepts
– Minimal message length (MML)
– Akaike’s information criterion (AIC)
![Page 4: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/4.jpg)
The generalized Dirichlet distribution
• A d dimensional generalized Dirichlet distribution is defined to be
It can be reduced to the Dirichlet distribuiton when
where and , , ,
d
iiX
1
1 10 iX 0i 0i 11 iiii
11 iii
![Page 5: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/5.jpg)
The generalized Dirichlet distribution
For the generalized Dirichlet distribution:
The GDD has a more general covariance structure than the DD and it is conjugate to multinomial distribution.
![Page 6: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/6.jpg)
GDD vs. Gaussian
• The GDD has smaller number of parameters to estimate. The estimation can be more accurate
• The GDD is defined in a support [0,1] and can be extended to a compact support [A,B]. It is more appropriate for the nature of data.
Beta distribution:
Beta type-II distribution:
They are equal if we set u=v/(1+v).
![Page 7: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/7.jpg)
A GDD mixture model
A generalized Dirichlet mixture model with M components, where p(X|α) takes a form of the GDD.
![Page 8: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/8.jpg)
The MML criterion
• The message length is defined as minus the logarithm of the posterior probability.
• After placing an explicit prior over parameters, the message length for a mixture of distribution is given as
prior likelihood Fisher Information
optimal quantization constant
![Page 9: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/9.jpg)
Fisher Information matrix
• The Fisher information matrix is the expected value of the Hessian minus the logarithm of the likelihood
where
![Page 10: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/10.jpg)
Prior distribution
• Assume the independence between difference components
Mixture weighs
GDD parameters
Place a Dirichlet distribution and a generalized Dirichlet distribution on P and α, respectively, with parameters set to 1.
![Page 11: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/11.jpg)
Message length
• After obtaining the Fisher information and specifying the prior distribution, the message length can be expressed as
![Page 12: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/12.jpg)
Estimation and selection algorithm
• The authors use an EM algorithm to estimate the mixture parameters.
• To overcome the computation issue and local maxima problem, they implement a fairly sophisticated initialization algorithm.
• The whole algorithm is summarized in the next page
![Page 13: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/13.jpg)
![Page 14: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/14.jpg)
Experimental results
The correct number of mixture are 5, 6, 7, respectively
![Page 15: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/15.jpg)
Experimental results
![Page 16: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/16.jpg)
Experimental results
• Web mining:– Training with multiple
classes of labels– Use to
predict the label of testing sample
– Use top 200 words frequency
![Page 17: by Nizar Bouguila and Djemel Ziou](https://reader036.vdocument.in/reader036/viewer/2022081504/56812eae550346895d9452f4/html5/thumbnails/17.jpg)
Conclusions
• A MML-based criterion is proposed to select the number of components in generalized Dirichlet mixtures.
• Full dimensionality of the data is used.• Generalized Dirichlet mixtures allow more model
ing flexibility than mixture of Gaussians.• The results indicate clearly that the MML and LE
C model selection methods outperform the other methods.