high dimensional discriminant analysis - mistismistis.inrialpes.fr/docs/presentation_hdda.pdfto...
TRANSCRIPT
High Dimensional DiscriminantAnalysis
Charles BouveyronLMC-IMAG & INRIA Rhône-Alpes
Joint work with S. Girard and C. Schmid
High Dimensional Discriminant Analysis - Lear seminar – p.1/43
IntroductionHigh dimensional data:
many scientific domains need to analyze data whichare increasingly complex,modern data are made up of many variables:imagery (MRI, vision), biology (DNA micro-array), ...
Classification is very difficult in high dimensionalspaces:
many learning methods suffer from the curse ofdimensionality [Bel61],since the number n of data is not generally sufficientto learn high-dimensional data.
The empty space phenomena [ST83] allows to assumethat data live in subspaces with lower dimensionality.
High Dimensional Discriminant Analysis - Lear seminar – p.2/43
IntroductionClassification:
supervised classification (discriminant analysis)requires examples of classes,unsupervised classification (clustering) aims toorganize data in homogeneous classes.
2 ways:generative methods: QDA, LDA, GMM,discriminantive methods: logistic regression andSVM.
Generative models can be both used in supervised andunsupervised classification.
High Dimensional Discriminant Analysis - Lear seminar – p.3/43
Outline of the talkDiscriminant analysis frameworkNew modelisation of high-dimensional dataHigh dimensional discriminant analysis (HDDA)
construction of the decision rulea posteriori probability and reformulation
Particular rulesEstimators and intrinsic dimension estimationNumerical results
application to image categorizationapplication to object recognition
Extension to unsupervised classification
High Dimensional Discriminant Analysis - Lear seminar – p.4/43
Part 1
Discriminant analysis framework
High Dimensional Discriminant Analysis - Lear seminar – p.5/43
Discriminant analysis frameworkDiscriminant analysis is the supervised part ofclassification, i.e. it requires a professor !
Discriminant analysis goals:descriptive aspect: find a data representation whichallows to interpret the groups using explanatoryvariables.decisional aspect: the major goal is to find the goodclass membership of a new data x.
Of course, HDDA favours the decisional aspect !
High Dimensional Discriminant Analysis - Lear seminar – p.6/43
Discrimination problemThe basic problem:
assign an observation x = (x1, ..., xp) ∈ Rp with
unknown class membershipto one of k classes C1, ..., Ck known a priori.
We have a learning dataset A:
A = {(x1, c1), ..., (xn, cn)/xj ∈ Rp and yj ∈ {1, ..., k}},
where the vector xj contains p explanatory variablesand yj indicates the index of the class of xj.We have to construct a decision rule δ:
δ : Rp → {1, ..., k}
x → y.
High Dimensional Discriminant Analysis - Lear seminar – p.7/43
Bayes decision ruleThe optimal decision rule δ∗, called Bayes decision rule,is :
δ∗ : x ∈ Ci∗ , if i∗ = argmaxi=1,...,k
{p(Ci|x)},
δ∗ : x ∈ Ci∗ , if i∗ = argmini=1,...,k
{−2 log(πi fi(x))},
where πi is the a priori probability of class Ci and fi(x)denotes the class conditional density of x.Generative methods usually assume that distributionsof classes are Gaussian N (µi,Σi).
High Dimensional Discriminant Analysis - Lear seminar – p.8/43
Classical discriminant analysis methodQuadratic discriminant analysis (QDA):
i∗ = argmini=1,...,k
{(x− µi)tΣ−1
i (x− µi) + log(det Σi)− 2 log(πi)}.
Linear discriminant analysis (LDA): with the assumptionthat ∀i, Σi = Σ
i∗ = argmini=1,...,k
{µtiΣ
−1µi − 2µtiΣ
−1x − 2 log(πi)}.
QDA and LDA have disappointing behavior when thesize of the training dataset n is small compared to thenumber p of variables.
High Dimensional Discriminant Analysis - Lear seminar – p.9/43
Discriminant analysis regularizationDimension reduction: PCA, FDA, features selection,
Fischer discriminant analysis (FDA) combines:a dimension reduction step (projection on the k − 1discriminant axes)with one of the previous methods (usually LDA).
Parsimonious models:Regularized discriminant analysis (RDA, [Fri89]), isan intermediate classifier between QDA and LDA,Eigenvalue decomposition discriminant analysis(EDDA, [BC96]) is based on the re-parametrizationof the covariance matrices of classes:
Σi = λiDiAiDti .
High Dimensional Discriminant Analysis - Lear seminar – p.10/43
Dimension reduction for classification
−20 −15 −10 −5 0 5 10 15 20−25
−20
−15
−10
−5
0
5
10
15
20
25
−15 −10 −5 0 5 10 15−25
−20
−15
−10
−5
0
5
10
15
20
PCA axes Discriminant axes
Fig.1 - High-dimensional data which classes live in differentsubspaces with lower dimensionality.
High Dimensional Discriminant Analysis - Lear seminar – p.11/43
Part 2
New modelisation
High Dimensional Discriminant Analysis - Lear seminar – p.12/43
New modelisation
The empty space phenomena enables us to assumethat HD data live in subspaces with low dimensionality.
The main idea of the new modelisation is:each class is decomposed on two subspaces withlow dimensionality,and the classes are assumed spherical in thesesubspaces.
High Dimensional Discriminant Analysis - Lear seminar – p.13/43
New modelisation
We assume that class conditional densities areGaussian N (µi,Σi) with means µi and covariancematrices Σi.Let Qi be the orthogonal matrix of eigenvectors of thecovariance matrix Σi,Let Bi be the basis of R
p made of the eigenvectors of Σi.The class conditional covariance matrix ∆i is defined inthe basis Bi by:
∆i = Qti Σi Qi.
High Dimensional Discriminant Analysis - Lear seminar – p.14/43
New modelisation
We assume in addition that ∆i contains only twodifferent eigenvalues ai > bi.Let Ei be the affine space generated by eigenvectorsassociated to the eigenvalue ai and such that µi ∈ Ei.We define also E
⊥i such that Ei ⊕ E
⊥i = R
p and µi ∈ E⊥i .
Let Pi and P⊥i be the projection operators on Ei and E
⊥i .
High Dimensional Discriminant Analysis - Lear seminar – p.15/43
New modelisation
Thus, we assume that ∆i has the following form:
∆i =
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
ai 0
. . .0 ai
0
0
bi 0
. . .. . .
0 bi
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
9
>
>
=
>
>
;
di
9
>
>
>
>
>
=
>
>
>
>
>
;
(p − di)
High Dimensional Discriminant Analysis - Lear seminar – p.16/43
New modelisation: illustration
High Dimensional Discriminant Analysis - Lear seminar – p.17/43
Part 3
High Dimensional DiscriminantAnalysis
High Dimensional Discriminant Analysis - Lear seminar – p.18/43
High Dimensional Discriminant AnalysisUnder the preceding assumptions, the Bayes decisionrule yields a new decision rule δ+:
Theorem 1: The new decision rule δ+ consists inclassifying x to the class Ci∗ if:
i∗ = argmini=1,...,k
{
1
ai‖µi − Pi(x)‖2 +
1
bi‖x − Pi(x)‖2
+di log(ai) + (p − di) log(bi) − 2 log(πi)
}
.
High Dimensional Discriminant Analysis - Lear seminar – p.19/43
HDDA: illustration
Ki(x) = 1ai‖µi − Pi(x)‖2 + 1
bi‖x − Pi(x)‖2 + di log(ai) + (p −
di) log(bi) − 2 log(πi).
High Dimensional Discriminant Analysis - Lear seminar – p.20/43
HDDA: a posteriori probabilityIn many applications, it is interesting to dispose of the aposteriori probability p(Ci|x) that x belongs to Ci.The Bayes formula yields:
p(Ci|x) =exp
(
−12Ki(x)
)
∑kj=1 exp
(
−12Kj(x)
),
where Ki is the cost function of δ+conditionally with theclass Ci:
Ki(x) =1
ai‖µi − Pi(x)‖2 +
1
bi‖x − Pi(x)‖2
+di log(ai) + (p − di) log(bi) − 2 log(πi).
High Dimensional Discriminant Analysis - Lear seminar – p.21/43
HDDA: reformulationIn order to interpret easily the decision rule δ+, weintroduce αi and σi:
ai = σ2i
αiand bi = σ2
i
(1−αi)
with αi ∈]0, 1[ and σi > 0.Thus, the decision rule δ+consists in classifying x to theclass Ci∗ if:
i∗ = argmini=1,...,k
{
1
σ2i
(
αi‖µi − Pi(x)‖2 + (1 − αi)‖x − Pi(x)‖2)
+2p log(σi) + di log
(
1 − αi
αi
)
− p log(1 − αi) − 2 log(πi)
}
.
Notation: HDDA is the model [aibiQidi] or [αiσiQidi].
High Dimensional Discriminant Analysis - Lear seminar – p.22/43
Part 4
Particular rules
High Dimensional Discriminant Analysis - Lear seminar – p.23/43
Particular rulesBy allowing some but not all of HDDA parameters tovary, we obtain 24 particular rules:
which correspond to different regularizations,which some ones are easily geometricallyinterpretable,which 9 have explicit formulations.
HDDA can be interpreted as a classical discriminantanalysis in particular cases:
if ∀i, αi = 12 : δ+ is QDA with sperical classes,
if in addition ∀i, σi = σ: δ+ is LDA with spericalclasses.
High Dimensional Discriminant Analysis - Lear seminar – p.24/43
Links with classical methods
EDDA HDDA
LDAs
QDA
...... LDA QDAs
LDA géo
Σi = λiDiAiDt
iΣi = Qi∆iQ
t
i
Σi = λDADt
Ai = Id
αi =1
2
Σi = σ2
iId σi = σ
πi = π∗
High Dimensional Discriminant Analysis - Lear seminar – p.25/43
Model [ασQidi]
The decision rule δ+consists in classifying x to the classCi∗ if:
i∗ = argmini=1,...,k
{α‖µi − Pi(x)‖2 + (1 − α)‖x − Pi(x)‖2}.
High Dimensional Discriminant Analysis - Lear seminar – p.26/43
Part 5
Estimation
High Dimensional Discriminant Analysis - Lear seminar – p.27/43
HDDA estimatorsEstimators are computed using maximum likelihoodestimation from the learning set A.Common estimators:
πi =ni
n, ni = #(Ci),
µi =1
ni
∑
xj∈Ci
xj ,
Σi =1
ni
∑
xj∈Ci
(xj − µi)t(xj − µi).
High Dimensional Discriminant Analysis - Lear seminar – p.28/43
Estimators of the model [aibiQidi]
Assuming di is known, the ML estimators are:Qi is made of the eigenvectors associated to theordered eigenvalues of Σi,ai is the mean of the largest di eigenvalues of Σi:
ai =
di∑
l=1
λil
di,
bi is the mean of the smallest (p − di) eigenvalues ofΣi:
bi =
p∑
l=di+1
λil
(p − di).
High Dimensional Discriminant Analysis - Lear seminar – p.29/43
Estimation trickThe decision rule δ+ do not requires to compute the last(p − di) eigenvectors of Σi.Thus, in order to minize the number of parameters toestimate, we use the following relation:
p∑
l=di+1
λil = tr(Σi) −di
∑
l=1
λil.
Number of parameters to estimate with p = 100, di = 10and k = 4:
QDA: 20 603HDDA: 4 323
High Dimensional Discriminant Analysis - Lear seminar – p.30/43
Intrinsic dimension estimationWe base our approach to chose the values of di oneigenvalues of Σi,We use two empirical methods:
common thresholding on the cumulative variance:
di = argmind=1,...,p−1
d∑
j=1
λj/
p∑
j=1
λj ≥ s
,
scree-test of Cattell:analyses differences between the eigenvalues inorder to find a brake in the scree of eigenvalues.
High Dimensional Discriminant Analysis - Lear seminar – p.31/43
Intrinsic dimension estimation
1 2 3 4 5 6 7 8 9 100
0.02
0.04
0.06
0.08
0.1
Ordered eigenvalues of Σi
0 2 4 6 8 100.2
0.4
0.6
0.8
1
Cumulative sum of eigenvalues
1 2 3 4 5 6 7 8 9 100
0.02
0.04
0.06
0.08
0.1
Ordered eigenvalues of Σi
0 2 4 6 8 100
0.01
0.02
0.03
0.04
0.05
Difference betwenn eigenvalues
Common tresholding Scree-test of Cattell
High Dimensional Discriminant Analysis - Lear seminar – p.32/43
Part 6
Numerical results
High Dimensional Discriminant Analysis - Lear seminar – p.33/43
Results: artificial data
Method Classification rateHDDA ([aibiQidi]) 0.958HDDA ([aibiQid]) 0.964
LDA 0.512FDA 0.51SVM 0.478
3 Gaussian densities in R15, with d1 = 3, d2 = 4 and
d3 = 5,In addition, the proportions are very different: π1 = 1
2 ,π2 = 1
3 and π3 = 16 ,
High Dimensional Discriminant Analysis - Lear seminar – p.34/43
Results: image categorization
A recent study [LBGGDH03] proposes an approachbased on the human perception to categorize naturalimages.An image is represented by a vector of 49 dimensions.Each one of these 49 components is the response ofthe image to a Gabor filter.
High Dimensional Discriminant Analysis - Lear seminar – p.35/43
Results: image categorizationData: 328 descriptors in 49 dimensions,Results:
Method Classification rateHDDA ([aibiQidi]) 0.857HDDA ([aibQid]) 0.881QDA 0.849LDA 0.775FDA (d = k − 1) 0.79SVM 0.839
Classification results for the image categorization experiment(leave-one-out).
High Dimensional Discriminant Analysis - Lear seminar – p.36/43
Results: object recognition
Our approach uses local descriptors(Harris-Laplace+Sift),We consider 3 object classes (wheels, seat andhandlebars) and 1 background class,The dataset is made of 1000 descriptors in 128dimensions:
learning dataset: 500, test dataset: 500.High Dimensional Discriminant Analysis - Lear seminar – p.37/43
Results: object recognition
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positives
True
pos
itives
SVM classifiersHDDA classifiers
FDA
LDA
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positivesTr
ue p
ositiv
es
HDDA
with error probability < 10−5
with error probability < 10−10
Classification results for the object recognition experiment.
High Dimensional Discriminant Analysis - Lear seminar – p.38/43
Results: object recognition
Recognition using HDDA
Recognition using SVM
High Dimensional Discriminant Analysis - Lear seminar – p.39/43
Part 7
Unsupervised classification
High Dimensional Discriminant Analysis - Lear seminar – p.40/43
Extension to unsupervised classificationThe unsupervised classification aims to organize datain homogeneous classes.Gaussian mixture models (GMM) are an efficient wayfor unsupervised classification:
in Gaussian mixture models, the density of themixture is:
f(x, θ) =
k∑
i=1
πifi(x;µi,Σi),
where θ = {π1, ..., πk, µ1, ..., µk,Σ1, ...,Σk}.the parameters estimation is generally done by theEM algorithm.
High Dimensional Discriminant Analysis - Lear seminar – p.41/43
Extension to unsupervised classificationUsing our model for HD data, the two main steps of theEM algorithm are:
E step: compute t(q)ij = t
(q)i (xj)
t(q)ij = exp(−K
(q)i (xj)/2)/
∑kl=1 exp(−K
(q)l (xj)/2),
where K(q)i (x) =
‖µ(q)i −P
(q)i (xj)‖
2
a(q)i
+‖xj−P
(q)i (xj)‖
2
b(q)i
+
d(q)i log(a
(q)i ) + (p − d
(q)i ) log(b
(q)i ) − 2 log(π
(q)i ).
M step: classical estimation of πi, µi and Σi; theestimators of ai, bi and Qi are the same as those ofHDDA.
High Dimensional Discriminant Analysis - Lear seminar – p.42/43
References[BC96] H. Bensmail and G. Celeux. Regularized gaussian discriminant analysis
through eigenvalue decomposition. Journal of the American StatisticalAssociation, 91:1743–1748, 1996.
[Bel61] R. Bellman. Adaptive Control Processes. Princeton University Press, 1961.[Fri89] J.H. Friedman. Regularized discriminant analysis. Journal of the American
Statistical Association, 84:165–175, 1989.[LBGGDH03] H. Le Borgne, N. Guyader, A. Guerin-Dugué, and J. Hérault. Classification of
images: Ica filters vs human perception. In 7th International Symposium onSignal Processing and its Applications, number 2, pages 251–254, 2003.
[ST83] D. Scott and J. Thompson. Probability density estimation in higherdimensions. In Proceedings of the Fifteenth Symposium on the Interface,North Holland-Elsevier Science Publishers, pages 173–179, 1983.
High Dimensional Discriminant Analysis - Lear seminar – p.43/43