3d object recognition with enhanced grassmann discriminant ...lincons/papers/egda.pdf · 3d object...

3D Object Recognition with EnhancedGrassmann Discriminant Analysis

Lincon Sales de Souza, Hideitsu Hino, and Kazuhiro Fukui

Graduate School of Systems and Information Engineering,University of Tsukuba, Japan

[email protected], {hinohide, kfukui}@cs.tsukuba.ac.jp

Abstract. Subspace representation has become a promising choice inthe classification of 3D objects such as face and hand shape, as it canmodel compactly the appearance of an object, represent effectively thevariations such as the change in pose and illumination condition. Sub-space based methods tend to require complicated formulation, though,we can utilize the notion of Grassmann manifold to cast the complicatedformulation into a simple one in a unified manner. Each subspace isrepresented by a point on the manifold. Thank to this useful correspon-dence, various types of conventional methods have been constructed ona manifold by the kernel trick using a Grassmann kernel. In particular,discriminant analysis on Grassmann manifold (GDA) have been knownas one of the useful tools for image set classification. GDA can work asa powerful feature extraction method on the manifold. However, thereremains room to improve its ability in that the discriminative space isdetermined depending on the set of data points on the manifold. Thissuggests that if the data on a manifold are not so discriminative, theability of GDA may be limited. To overcome this limitation, we con-struct a set of more discriminative class subspaces as the input for GDA.For this purpose, we propose to project class subspaces onto a gener-alized difference subspace (GDS), before mapping class subspaces ontothe manifold. The GDS projection can magnify the angles between classsubspaces. As a result, the separability of data points between differentclasses is improved and the ability of GDA is enhanced. The effectivenessof our enhanced GDA is demonstrated through classification experimentswith CMU face database and hand shape database.

1 Introduction

In this paper, we discuss a framework for characterizing 3D objects with im-age sets, which are obtained from a video or multiple camera system, focusingon classification tasks of face and hand shape. Subspace representation is veryeffective for image set based classification, since a set of images of an objectcan be effectively modeled by a low-dimensional subspace of a high-dimensionalvector space [1–5]. For example, the appearance of a face under varying lightingconditions can be compactly represented with a low (from 4 to 9)-dimensionalsubspace [5]. In subspace-based classification, an input subspace is classified by

2 Lincon Sales de Souza, Hideitsu Hino, and Kazuhiro Fukui

Grassmann manifold

ℝ𝑑𝐻

𝑃1

𝑃2 𝑄1

𝑄2

𝑃 1

𝑃 2 𝑄 1

𝑄 2

Subspaces projected onto

the GDS

𝑃1

𝑃2 𝑄1

𝑄2

𝑃1

𝑃2

𝑄1 𝑄2

Enhanced Grassmann manifold

Projection

Subspaces

in ℝ𝑑

𝜃1, 𝜃2, …

PCA PCA

… …

… …

PCA PCA

𝑃1

𝑃2 𝑄1

𝑄2

𝑃1 𝑃2

𝑄1 𝑄2

Discriminant analysis

process

Fig. 1. Conceptual diagram of Grassmann manifold enhanced by GDS projection. Theupper process shows the case without GDS projection and the lower process shows thecase with GDS projection. There are four subspaces of 2 classes in a high dimensionalvector space. P1 and P2 belong to one class, and Q1 and Q2 to another class.

using the canonical angles between it and reference subspace, where the inputand reference subspace are generated from the sets of input and reference imagesof each class, respectively. The concept of canonical angle is a natural extensionof angle between two vectors [6, 7]. We can effectively measure the closeness be-tween two m-dimensional subspaces by using m canonical angles between them.Mutual subspace method (MSM) [8] is well known as a fundamental classificationmethod using canonical angles.

Subspace based method using canonical angles is often formulated as a com-plicated procedure. To avoid this issue, it is useful to introduce the Grassmannmanifold G(m,D), which is defined as a set of m-dimensional linear subspacesof RD [9]. In this framework, a subspace-based method is regarded as a simpleclassification method on a Grassmann manifold, where each single subspace istreated as a point. For example, MSM corresponds to the simplest method usingthe distance between two points on a Grassmann manifold. According to thiscorrespondence, various types of classification methods have been constructed ona Grassmann manifold [10, 11]. In particular, discriminant analysis on a Grass-mann manifold (GDA) has been known as one of the useful tools for image setclassification. GDA can be easily conducted as a kernel discriminant analysisthrough the kernel trick with a Grassmann kernel, as will be described in detailslater.

GDA can work as a powerful feature extraction method on a Grassmannmanifold. However, GDA cannot touch and operate data points on the manifolddirectly, because it uses the framework of the kernel trick. GDA is capable offinding out the optimal discriminant space from the given data set. Hence, ifclass subspaces were not separable in a vector space, the corresponding data

3D Object Recognition with Enhanced Grassmann Discriminant Analysis 3

points on the manifold are not also separable. In this case, the best performancethat GDA can achieve is limited. This implies that there still remains room toimprove the discriminative ability of GDA by using more discriminative classsubspaces. From this viewpoint, we propose to project class subspaces onto ageneralized difference subspace (GDS) [12], before mapping each class subspaceon a Grassmann manifold. We expect the GDS projection can enhance the dis-criminative ability of GDA, as it can magnify the angles between different classsubspaces to provide more discriminative sample for GDA, as shown in Fig. 1.The validity of our enhanced GDA is demonstrated through experiments withthe CMU Multi PIE face database [13] and hand shape database [14].

The paper is organized as follows. In Section 2, we introduce the basic idea ofthis paper. In Sections 3 and 4, we review GDS, Grassmann manifold and relatedconcepts. In Section 5, we define the concept of the enhanced GDA and describethe proposed framework. In Section 6, we evaluate the framework through exper-iments with CMU face database and hand shape database. Section 7 concludesthe paper.

2 Basic Idea on Enhanced Grassmann DiscriminantAnalysis

Our basis idea is to utilize more discriminative class subspaces as the inputsfor GDA, as mentioned previously. There are several approaches for obtainingdesirable class subspaces, instead of naive class subspaces, which is generatedfrom a set of raw appearance images of an object by using the principal compo-nent analysis (PCA). We can generate such desirable class subspace from certainkind of discriminative feature vectors, which are extracted from raw appearanceimages of an object. However, the subspace representation may be not validin the vector space of the extracted feature unlike in that of the original rawappearance images.

In this paper, we propose to project class subspaces onto a generalized dif-ference subspace (GDS), before mapping them onto a Grassmann manifold. Theadvantage of the proposed approach against the approach based on feature vec-tors is that more discriminative class subspaces can be obtained directly andsmartly from a set of given class subspaces. This approach works even whenthe subspace representation is not valid. In addition, it has high scalability forfurther extensions.

Generalized difference subspace (GDS) is defined as a subspace, which rep-resents a “difference component” among multiple class subspaces [12]. GDS is afurther extension of difference subspace (DS) for two class subspaces, which is anatural generalization of a difference vector of two vectors. GDS projection canmagnify the canonical angles between different class subspaces by removing acommon subspace of them. The details will be described later. As a result, thedata points mapped onto the Grassmann manifold become more separable asshown in Fig. 1.


Although GDA and GDS projection seem to aim at the similar effect, theirmechanisms are quite different in that GDA works on a Grassmann manifold,while GDS projection works in a high dimensional vector space before mappingonto the manifold. From this difference, we expect a kind of synergistic effectfrom both to enhance the ability of GDA. In the following, we refer the GDA withGDS projection as enhanced GDA (eGDA). Further, GDS has been kernelizedby using the kernel PCA to kernel GDS (KGDS) to deal with nonlinear classsubspaces [12]. We refer the GDA with KGDS projection as enhanced kernelGDA (eKGDA).

3 Conventional Grassmann Discriminant Analysis

In this section we define canonical angles, outline the concept of Grassmannmanifold and the most used kernel with it, and explain the algorithm of GDA.

3.1 Definition of Canonical Angles

Suppose we have two subspaces that we want to compare their similarity: a dp-dimensional subspace P and a dq-dimensional subspace Q, and both lay in ad-dimensional vector space. For convenience, we suppose dp ≤ dq. The canon-ical angles {0 ≤ θ1, . . . , θdp ≤ π

2 } between P and Q are recursively defined asfollows [15, 16]:

cos θi = maxui∈P

maxvi∈Q

u>i vi

s.t. ‖ui‖ = ‖vi‖ = 1, u>i uj = v>i vj = 0, j = 1, . . . , i− 1,(1)

where ui and vi are the canonical vectors that form the i-th smallest canonicalangle, θi. The first canonical angle θ1 is the smallest angle between P and Q.The second canonical angle θ2 is the smallest angle in a direction orthogonal tothat of θ1. The remaining θi for i = 3, . . . , dp are calculated analogously, in adirection orthogonal to all smaller canonical angles.

There are several methods to calculate canonical angles [15–17]. The sim-plest and most practical method is singular value decomposition (SVD). Letthe subspaces be represented as matrices of unitary orthogonal bases, P =[Φ1 . . .Φdp ] ∈ Rd×dp and Q = [Ψ1 . . .Ψdq ] ∈ Rd×dq , where Φi are the bases for Pand Ψi are the bases for Q. Let the SVD of P>Q ∈ Rdp×dq be P>Q = UΣV >,

s.t. Σ = diag(κ1, . . . , κdp), where {κi}dpi=1 represents the set of singular values.

The canonical angles {θi}dpi=1 can be obtained as {cos−1(κ1), . . . , cos−1(κdp)}

(κ1 ≥ . . . ≥ κdp). The corresponding canonical vectors, ui, vi(i = 1, . . . , dp)are obtained by the equations [u1u2 . . .udp ] = AU and [v1v2 . . .vdp ] = BV .The similarity between the two subspaces P and Q is measured by t angles asfollows:

S[t] =1

t

t∑i=1

cos2 θi, 1 ≤ t ≤ dp. (2)


3.2 Grassmann Manifold

The Grassmann manifold G(m, d) is defined as the set of m-dimensional linearsubspaces of Rd. It is an m(d − m)-dimensional compact Riemannian mani-fold and can be derived as a quotient space of orthogonal groups G(m, d) =O(d)/O(m)×O(d−m), where O(m) is the group of m×m orthonormal matri-ces. The Grassmann manifold can be embedded in a reproducing kernel Hilbertspace by the use of a Grassmann kernel. In this case, the most popular kernelis the projection kernel kp, which can be defined as kp(Y1,Y2) =

∑mi=1 cos2 θi.

We can measure the distance between two points on a Grassmann manifold byusing this projection kernel.

3.3 Algorithm of Grassmann Discriminant Analysis

Discriminant analysis on Grassmann manifold (GDA) [10] is conducted as kernelLDA with the Grassmann kernels. Its predecessor, linear discriminant analysis(LDA) [18], followed by a K-NN classifier, is well known and has been successfullyused for classification. Let x1, . . . ,xN be the data vectors and y1, . . . , yN (yi ∈1, . . . , C) be the class labels. Each class c has Nc number of samples. Let µc =1Nc

∑i|yi=c xi be the mean of class c, and µ = 1

N

∑i xi be the overall mean.

LDA searches for the discriminant direction w which maximizes the Rayleighquotient R(w) = w′Sbw/w

′Sww where Sb and Sw are the between-class andwithin-class covariance matrices respectively:

Sb =1

N

C∑c=1

Nc(µc − µ)(µc − µ)>, (3)

Sw =1

N

C∑c=1

∑i|yi=c

(xi − µc)(xi − µc)>. (4)

The optimalw is obtained from the largest eigenvector of S−1w Sb. Since S−1w Sbhas rank C−1, there are C−1 optima W = {w1, . . . ,wC−1}. By projecting dataonto the space spanned by W , we achieve dimensionality reduction and featureextraction of data onto the most discriminant subspace.

Kernel LDA [19–21] can be formulated by using the kernel trick as follows.Let Γ : Rd → F be a non-linear map from the input space Rd to a feature spaceF , and Γ = [γ1, . . . ,γN ] be the feature matrix of the mapped training pointsγi. Assuming w is a linear combination of those feature vectors, w = Γα, wecan use the kernel trick and rewrite the Rayleigh quotient in terms of α as:

Ra(α) =α>Γ>SbΓα

α>Γ>SwΓα=

α>K(V − eNe>N/N)Kα

α>(K(IN − V )K + σ2IN )α=

α>Σbα

α>(Σw + σ2IN )α,

(5)where K is the kernel matrix, eN is a vector of ones that has length N ,

V is a block-diagonal matrix whose c-th block is the matrix eNce>Nc

/Nc, and


Σb = K(V − eNe>N/N)K. The term σ2IN is used for making the computationstable, and for regularizing the covariance matrix Σw = K(IN − V )K. It iscomposed of the covariance shrinkage factor σ2 > 0, and the identity matrix INof size N . The set of optimal vectors α are computed from the eigenvectors of(Σw + σ2IN )−1Σb.

4 Projection onto Generalized Difference Subspace

In this section, we outline the concept of generalized difference subspace (GDS)and explain how to generate a GDS H from multiple class subspaces. As men-tioned previously, a GDS represents “difference components” among multipleclass subspaces [12], as an extension of difference subspace (DS) between twoclass subspaces.

We firstly describe the concept of DS, which is a natural extension of thedifference vector between two vectors, u and v ∈ Rd. To formally define DS,let us consider two dm-dimensional subspaces P and Q in Rd. When there is nointersection between the two subspaces, dm canonical angles {θi}dmi=1 are obtainedbetween the subspaces. Let di be the difference vector, ui − vi, between thecanonical vectors ui ∈ P and vi ∈ Q, which form the i-th canonical angle θi. ADS is defined by the normalized {di}dmi=1 [12].

In addition to the above geometrical definition, a DS can be also analyticallydefined by using the sum matrix, S = P + Q, of the two orthogonal projectionmatrices P and Q, which correspond to the orthogonal projection operatorsonto the subspaces P and Q, respectively. The projection matrices P and Q aredefined as

∑dmi=1ΦiΦ

>i and

∑dmi=1 ΨiΨ

>i , respectively, where Φi and Ψi represent

the bases of P and Q. The DS is spanned by the dm eigenvectors of matrix Sthat correspond to eigenvalues smaller than 1.

According to the analytical definition, the concept of DS has been extendedto the generalized difference subspace (GDS) for multiple class subspaces [12].Given C(≥2) m-dimensional class subspaces, {Pc}Cc=1, a generalized differencesubspace (GDS), H, can be defined as the subspace produced by removing theprincipal component subspace (PCS) of all the class subspaces from the sumsubspace, S , of those subspaces. From this definition, the GDS is defined as thesubspace spanned by dh eigenvectors, {di}dhi=1 corresponding to the dh smallest

eigenvalues of the sum matrix (S =∑Cc=1 Pc) of orthogonal projection matrices

Pc of the class c. The optimal dm is experimentally determined according to theorthogonal degree between the class subspaces projected onto the GDS.

Further, the concepts of DS and GDS have been extended to nonlinear kernelDS (KDS) and kernel GDS (KGDS), HΦ, by using the kernel trick [12]. In thesemethods, each image set is represented by a nonlinear subspace.

5 Enhancement of the Discriminative Ability of GDA

In this section, we explain the algorithm of our enhanced GDA, referring to theflow chart shown in Fig. 2.


Generation

Training Subspaces Training Subspaces Training Subspaces

KPCA

Class 1 Class 2 Class C

…

…

Class Subspace {𝑀𝑐}𝑐=1

𝐶

Projection

Projection

… …

Input Patterns {𝑥𝑖𝑛}

KPCA

Input Subspace GDA

Prediction

Recognition Phase Training Phase

GDS 𝐻 or KGDS 𝐻Φ

Subspaces

𝑌𝑖𝑐

𝑖=1𝑁𝑐

Input Subspace

𝑋

Fig. 2. Scheme of the eGDA/ eKGDA framework. In the training phase, each i-thimage set of c-th class is modeled by a subspace matrix Y ci . Each c-th class is modeledby a larger class subspace Mc, which is generated from a set of Y ci . A GDS H orKGDS, HΦ is generated from a set of {Mc}Cc=1. The subspaces {Y ci }Nc

i1are projected

onto the GDS/KGDS, and further mapped onto the Grassmann manifold. Finally,the discriminant analysis is applied to the set of mapped subspaces. In the recognitionphase, a set input patterns {xin} is also modeled as a subspace matrix X. It is projectedonto the GDS/ KGDS and then mapped onto the manifold. The prediction of an inputset is done by using 1-NN in the obtained discriminant space on the manifold.

Given Nc training sets {xi,cl }Lc

i

l=1 for each c-th class (c = 1, . . . , C) and a set

of Lin input images {xinl }Lin

l=1 . Each of these sets contains images in differentillumination conditions, or express different angles of a face.

In our framework, an image with the size w×h is represented by a d(= w×h)-

dimensional vector and each image set is represented by a subspace; {xi,cl }Lc

i

l=1

and {xin} are represented by Yci and X , respectively. The orthogonal basis ofeach subspace is obtained as the eigenvectors corresponding to the m largest

eigenvalues of the image set auto-correlation matrix, Rci = 1Lc

i

∑Lci

l=1 xi,cl x

i,c>

l . In

the following, each subspace m-dimensional Y is represented by the d×m matrix,Y , which has the corresponding orthogonal basis as its column vectors.

In order to utilize effectively the feature extraction function of GDS, weintroduce the global class subspaces Mc, which is denoted by a matrix Mc ∈Rd×dm , which represents compactly all the subspaces belonging to the same classc. The orthogonal basis ofMc can be obtained as the eigenvectors corresponding


to the dm largest eigenvalues of the auto-correlation matrix:

Rc =1

Nc

Nc∑i=1

Rci =1

LciNc

Nc∑i=1

Lci∑

l=1

xi,cl xi,c>

l . (6)

Next, to generate a GDS, we calculate the total sum matrix, S, which isdefined previously as:

S =

C∑c=1

dm∑j=1

ΦcjΦc>j , (7)

where Φcj is a basis of the dm-dimensionalMc. As seen in Section 4, the orthog-

onal basis of the GDS can be obtained dh eigenvectors, {di}dhi=1 corresponding tothe dh smallest eigenvalues of the sum matrix S. The subspaces Y ci are projected

onto the GDS and their projections are denoted by {Y ci }Nci=1 ∈ Rdh×m. The input

subspace of X is also projected onto the GDS and its projection is denoted byX.

We apply the GDA to these projected subspaces through the procedure inSec.3.3. For example, the kernel matrix, K, is calculated as the similarity matrixbetween class subspaces Yq and Yw. We call the GDA/KGDA, which is con-structed from these more discriminant subspaces on GDS/KGDS, as enhancedGDA/KGDA, eGDA/eKGDA. The step-by-step training and testing algorithmsof eGDA and eKGDA are shown in Algorithms 1 and 2, respectively.

6 Evaluation Experiments

In this section, we discuss the validity of the proposed method through faceand hand shape recognition tasks. For this purpose, we compared the proposedmethod with known state-of-the-art subspace-based methods, namely the mu-tual subspace method (MSM) [22], RBF kernel constrained mutual subspacemethod (KCMSM) [12], Grassmann discriminant analysis (GDA) [10], and itsRBF kernel extension (KGDA) [23].

6.1 Experiments on Hand Shape Recognition

We conducted two types of experiments with the Tsukuba hand shape datasetas shown in Fig. 3. This database contains 30 (hand classes) × 100 (subjects)image sets, each of which contains 28 hand shape images, consisting of 4 frames× 7 different view-points. In the experiments, all the images were resized to24× 24 pixels.

In the first preliminary experiment, to aid the understanding of the mecha-nism of our enhanced GDA, we performed visualizations of three different handclasses, using three image sets of 15 subjects. Once the class subspaces aremapped onto a C − 1-dimensional discriminant space on the manifold, they canbe treated as points on the discriminant space. Thus, we can easily visualize the


Algorithm 1: Training algorithm of eGDA/ eKGDA

input: pattern sets {xi,cl }Lc

il=1, with class label c

if eGDA thenfor c = 1, . . . , C do

for i = 1, . . . , Nc do

Rci ← 1Lc

i

∑Lci

l=1 xi,cl x

i,c>

l // calculate set covariance matrix

Y ci ← EVD(Rci ) // apply eigendecomposition

end

Rc ← 1Nc

∑Nci=1R

ci // calculate class covariance matrix

Mc ← EVD(Rc) // apply eigendecomposition

end

P,H ← EVD(∑Cc=1McM

>c ) // obtain GDS and principal subspace

foreach Y ci do Y ci ← H>Y ci // project all subspaces onto the GDS

else if eKGDA thenfor c = 1, . . . , C do

for i = 1, . . . , Nc do

Kci ← 1

Lci

∑Lci

l=1 k(xi,cl ,xi,cl ) // calculate set kernel matrix

Y ci ← EVD(Kci ) // apply eigendecomposition

end

Kc ← 1Nc

∑Nci=1K

ci // calculate class kernel matrix

Mc ← EVD(Kc) // apply eigendecomposition

end

PΦ, HΦ ← EVD(∑Cc=1McM

>c ) // obtain KGDS and principal subspace

foreach Y ci do Y ci ← HΦ>Y ci // project all subspaces onto the KGDS

endfor q = 1, . . . , N do

for w = 1, . . . , N do

[Strain]wq ← kp(Yq, Yw) // generate similarity matrix

end

endα∗ ← maxαRa(α) // solve LDA problem

Ftrain = α∗>Strain // compute training coefficients

return Ftrain


Algorithm 2: Input evaluation algorithm of eGDA/ eKGDA

input: pattern set with L′ input images {xin}if eGDA then

Rin ← 1L′

∑L′

l′=1 xinl′ x

in>l′ // calculate set covariance matrix

X ← EVD(Rin) // apply eigendecomposition

X ← H>X // project subspace onto the GDS

else if eKGDA then

Kin ← 1L′

∑L′

l′=1 k(xinl′ xinl′ ) // calculate set kernel matrix

X ← EVD(Kin) // apply eigendecomposition

X ← HΦ>X // project subspace onto the KGDS

endfor q = 1, . . . , N do

[Stest]q ← kp(Yq, X) // generate similarity matrix

end

Ftest = α∗>Stest // compute test coefficients

pred(xin)← NN(Ftrain,Ftest) // perform 1-NN classification

return pred(xin) // return a class prediction

distribution of the points to check the metric structure. For example, for threeclasses, C=3, points are on 2-dimensional discriminant space. Figure 4 shows thescatter points plotted on the discriminant space in two different combinationsof three classes. We can see that in both cases, the proposed eGDA is able togenerate a more discriminative space on the manifold in comparison with theconventional GDA. Although the conventional GDA can generate the discrimi-native space, its class separability looks lower than that obtained by the eGDA.This may be because the original GDA can only select the most discriminantdirections as expected and cannot further adjust the layout of the points onthe manifold. In our eGDA, thanks to the GDS projection, we can obtain morediscriminant space.

In the second experiment, we evaluated the performances of the proposedeGDA and eKGDA, in the classification problem of 30 kinds of hand shapes. Weused 15 × 30 image sets from 15 subjects as training sets, 15 × 30 image setsfrom 15 subjects as validation sets, and 70 × 30 image sets from the remaining70 subjects as testing sets, where each image set contains 28 hand images. Table1 shows the results. The accuracy refers to the performance on the test sets byusing the show parameters, which are the optimal parameters found by usingthe validation sets. mtrain and mtest refer to the dimension of training and testsubspaces in MSM and KCMSM. Plain m is the dimension for both test andtraining in manifold frameworks. dm is the dimension of class subspaces used togenerate a GDS in the proposed methods. σ refers to the RBF kernel parameter,and dp is the dimension of the principal subspace. Where there is a “-”, it meansthat parameter is not applicable.


(a) 30 classes of hand shapes.

(b) Seven views of hand images.

Fig. 3. Sample images from the Tsukuba hand shape dataset. It shows the (a) 30classes of hand shapes, and (b) seven views of a hand shape belonging to the 30-thclass.

We can see that the proposed eKGDA obtained the best performance amongthe methods, followed by KCMSM, while GDA and KGDA did not perform verywell. These results suggest the usefulness of the feature extraction by the GDSprojection in the GDA formulation as expected.

Table 1. Results of the experiment using the Tsukuba hand shape dataset. The accu-racy refers to the performance on the test sets by using the shown parameters, whichare the optimal parameters found by using the validation sets. mtrain and mtest referto the dimension of training and test subspaces in MSM and KCMSM. Plain m is thedimension for both test and training in manifold frameworks. dm is the dimension ofclass subspaces used to generate a GDS in the proposed methods. σ refers to the RBFkernel parameter, and dp is the dimension of the principal subspace. Where there is a“-”, it means that parameter is not applicable.

Accuracy mtrain mtest σ dpMSM 62.30 30 6 - -

KCMSM 69.77 100 16 0.5 20

m σ dpGDA 61.13 16 - -

KGDA 67.09 14 1 -

dm m σ dpeGDA 69.20 150 16 - 5

eKGDA 71.69 150 14 1 20

6.2 Face Recognition Experiments on CMU Multi PIE Dataset

We conducted experiments to check the validity of the enhanced GDA/KGDAon two kinds of tasks on face recognition.

In the first experiment, we performed face recognition using only the subsetof front faces with neutral expression from CMU Multi PIE dataset [13].


-0.3 -0.2 -0.1 0 0.1 0.2

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

-0.2 -0.1 0 0.1 0.2

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

(a) Conventional GDA (b) eGDA

-0.15 -0.1 -0.05 0 0.05 0.1-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

-0.15 -0.1 -0.05 0 0.05 0.1-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

(c) Conventional GDA (d) eGDA

Fig. 4. Scatter points of three hand shape classes that are somewhat different by using(a) Conventional GDA; (b) eGDA. In addition, scatter points of three hand shapeclasses that are somewhat similar: (c) Conventional GDA; (d) eGDA.

We used face images of 128 subjects, whose data were collected in 4 sessions.In this experiment, 20 face images of the same person, which were capturedunder different 20 kinds of illumination conditions is treated as one image set.Figure 5 (a) shows examples of face images with neutral expression. The imageswere cropped and resized to 16× 16 pixels. We used 12-fold cross-validation toevaluate the performance of each method. For each fold, two sessions were usedfor testing, one for training and one for validation. The parameters used on eachfold’s test sets were those that minimize error on the respective fold’s validationsets. Figure 6 shows average recognition accuracies for six different methods,along with the standard deviations.

The experimental result in Fig. 6 (a) shows that eGDA has the highest av-erage value. We also conducted a t-test between eGDA and KCMSM with 12samples and significance level α = 0.05. From the test results, we can conclude


(a) Illumination conditions.

(b) Facial expressions.

Fig. 5. Sample images from CMU Multi PIE dataset (a) from several illuminationconditions and (b) facial expressions.

with more than 95% confidence (p = 0.0184) that the proposed eGDA performedbetter than KCMSM.

In the second experiment, we conducted face classification under more diffi-cult conditions, including image sets with other types of facial expressions, suchas smile, surprise and disgust of the 128 subjects, as shown in Fig. 5 (b). Thetypes of facial expressions of each session are listed in Table 2. We executeda 10-fold cross-validation where two sessions were selected for testing, one fortraining and one for validation. Parameter optimization uses the validation setsas explained previously.

Figure 6 (b) shows average recognition accuracies for six different methodsin the second experiment, along with the standard deviations. The experimentalresult shows the advantage of our eGDA and eKGDA against the conventionalmethods. To confirm the validity of them, we conducted a t-test between KGDAand eKGDA, with 10 samples and significance level α = 0.05. From the results,we can conclude with more than 95% confidence (p = 0.0423) that the proposedmethod, eKGDA, performed better than the conventional KGDA.

In the first experiment, the challenge was that there were few training datafor the methods, just one subspace per person class. In such cases, the Grassmannmanifold formulation like GDA usually presents a drop in performance, as theyneed more subspaces to estimate the structure of the Grassmann manifold. Theproposed method could alleviate this issue by enhancing discriminative abilityof each class subspace even in such situation.

In the second experiment, the addition of five other expressions largely in-creased the difficulty of the face classification task, because of the fact that inmost cases, the learned expressions may be a little different from those used tooptimize parameters, and also different from the ones that show up during testphase. The addition of expressions also caused large inner class variations. Wecan see a drop in performance of the conventional KCMSM, as it does not havethe manifold mechanism for collapsing the inner-class variations. In contrast, theproposed methods could still perform well in this case.

7 Conclusions

In this paper we have proposed an enhanced Grassmann discriminant analy-sis and its kernel version to address more effectively the classification of 3D


MSM KCMSM GDA KGDA eGDA eKGDA80

85

90

95

100

Accura

cy (

%)

MSM KCMSM GDA KGDA eGDA eKGDA75

80

85

90

95

100

Accura

cy (

%)

(a) First CMU experiment. (b) Second CMU experiment.

Fig. 6. The results of the experiments. Averages of accuracies are shown with standarddeviations. (a) First CMU experiment and (b) second CMU experiment.

Table 2. Facial expressions present in the sessions of CMU Multi PIE dataset. Thenumber within each cell indicates how many sets with that expressions exist for oneperson, in that session.

Expression Session 1 Session 2 Session 3 Session 4

Neutral 1 1 1 2

Smile 1 1

Surprise 1

Squint 1

Disgust 1

Scream 1

Total 2 3 3 3

object with image sets, focusing on the applications of face and hand shapesclassification. The key idea of our enhanced Grassmann manifold is to projectclass subspaces onto a generalized difference subspace before mapping them ona Grassmann manifold. The GDS projection can extract the differences betweenclasses and generate data points with optimized between-class separability onthe manifold, which are more desirable for GDA. The validity of our enhancedGrassmann discriminant analysis was evaluated through classification experi-ments with CMU face dataset and hand shape dataset, where it outperformedstate-of-the-art methods such as the kernel Grassmann discriminant analysis andthe kernel constrained mutual subspace method. As a future work, we seek tocomprehend the relationship between the two types of mapping in GDS projec-tion and Grassmann manifold more clearly.

Acknowledgement

This work is supported by JSPS KAKENHI Grant Number 16H02842.


References

1. Shashua, A.: On photometric issues in 3d visual recognition from a single 2d image.International Journal of Computer Vision 21 (1997) 99–122

2. Belhumeur, P.N., Kriegman, D.J.: What is the set of images of an object under allpossible lighting conditions? In: Computer Vision and Pattern Recognition, 1996.Proceedings CVPR’96, 1996 IEEE Computer Society Conference on, IEEE (1996)270–277

3. Georghiades, A.S., Belhumeur, P.N., Kriegman, D.J.: From few to many: Illumi-nation cone models for face recognition under variable lighting and pose. IEEEtransactions on pattern analysis and machine intelligence 23 (2001) 643–660

4. Lee, K.C., Ho, J., Kriegman, D.J.: Acquiring linear subspaces for face recogni-tion under variable lighting. IEEE Transactions on pattern analysis and machineintelligence 27 (2005) 684–698

5. Basri, R., Jacobs, D.W.: Lambertian reflectance and linear subspaces. IEEE trans-actions on pattern analysis and machine intelligence 25 (2003) 218–233

6. Hotelling, H.: Relation between two sets of variables. Biometrica 28 (1936) 322–377

7. Afriat, S.: Orthogonal and oblique projectors and the characteristics of pairs ofvector spaces. Proc. Cambridge Philos. Soc. 53 (1957) 800–816

8. Yamaguchi, O., Fukui, K., Maeda, K.: Face recognition using temporal imagesequence. Proc. International Conference on Automatic Face and Gesture Recog-nition (1998) 318–323

9. Chikuse, Y.: Statistics on special manifolds. Springer, Lecture. Notes in Statistics174 (2013)

10. Hamm, J., Lee, D.D.: Grassmann discriminant analysis: a unifying view onsubspace-based learning. In: Proceedings of the 25th international conference onMachine learning, ACM (2008) 376–383

11. Turaga, P., Veeraraghavan, A., Srivastava, A., Chellappa, R.: Statistical compu-tations on grassmann and stiefel manifolds for image and video-based recognition.IEEE Trans. Pattern Analysis and Machine Intelligence 33 (2011) 2273–2286

12. Fukui, K., Maki, A.: Difference subspace and its generalization for subspace-basedmethods. Pattern Analysis and Machine Intelligence, IEEE Transactions on 37(2015) 2164–2177

13. Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-pie. Image andVision Computing 28 (2010) 807–813

14. Ohkawa, Y., Fukui, K.: Hand shape recognition using the distributions of multi-viewpoint image sets. IEICE Transactions on Information and Systems E95-D(2012) 1619–1627

15. Hotelling, H.: Relations between two sets of variates. Biometrika 28 (1936) 321–377

16. Afriat, S.N.: Orthogonal and oblique projectors and the characteristics of pairsof vector spaces. In: Mathematical Proceedings of the Cambridge PhilosophicalSociety. Volume 53., Cambridge Univ Press (1957) 800–816

17. Maeda, K., Watanabe, S.: A pattern matching method with local structure. Trans.IEICE 68 (1985) 345–352

18. Fukunaga, R.: Statistical pattern recognition. (1990)

19. Scholkopft, B., Mullert, K.R.: Fisher discriminant analysis with kernels. Neuralnetworks for signal processing IX 1 (1999) 1

20. Baudat, G., Anouar, F.: Generalized discriminant analysis using a kernel approach.Neural computation 12 (2000) 2385–2404


21. Li, Y., Gong, S., Liddell, H.: Constructing structures of facial identities usingkernel discriminant analysis. In: The 2nd International Workshop on Statisticaland Computational Theories of Vision. (2001)

22. Yamaguchi, O., Fukui, K., Maeda, K.: Face recognition using temporal imagesequence. In: Automatic Face and Gesture Recognition, 1998. Proceedings. ThirdIEEE International Conference on, IEEE (1998) 318–323

23. Harandi, M.T., Salzmann, M., Jayasumana, S., Hartley, R., Li, H.: Expandingthe family of grassmannian kernels: An embedding perspective. In: EuropeanConference on Computer Vision, Springer (2014) 408–423

3d object recognition with enhanced grassmann discriminant ...lincons/papers/egda.pdf · 3d object...

Documents