[ieee 2013 3rd international conference on consumer electronics, communications and networks...

4
An Algorithm Research of Supervised LLE Based On Mahalanobis Distance and Extreme Learning Machine Ling-Min He College of information engineering China Jiliang University Hangzhou, China Wei Jin, Xiao-Bin Yang, Kang-Jian Wang College of information engineering China Jiliang University Hangzhou, China Abstract—The Locally Linear Embedding (LLE) is one of the efficient nonlinear dimensionality reduction techniques. But for some high dimensional data, it is not taking the class information of the data into account and Euclidean distance can not accurately reflect the similarity among samples. The paper proposes an improved Supervised LLE which combines class labeled data and Mahalanobis Distance (MSP-LLE). First, the approach learns a Mahalanobis Distance from the existing data. Then the Mahalanobis Distance and label information are combined to choose neighborhoods. Finally, ELM is using to map the unlabeled data to the feature space, which easily implement fault pattern recognition. The experiment result shows its good performance on reduction and recognition for high-dimensional and similar data. Keywords-locally linear embedding; extreme learning machine; Mahalanobis distance; supervised; reduction; recognition I. INTRODUCTION Data type and amount in human society is growing and cumulating in amazing speed which is caused by emerging new services such as blogs, social networks, and location-based services LBS, the era of big data has come. The data’s characteristic of high dimension makes the problem greatly complex. One of the effective methods to deal with high-dimensional data is realizing the dimension reduction to obtain the raw data compression and provide nature information for subsequent data analysis. In the process of dimension reduction, part of original data information will be lost. So how to maintain high dimensional data in original space after dimension reduction is particularly important. So far, there have been a variety of linear dimension reduction methods, such as PCA (Principal Component Analysis) [1], MDS (Multidimensional Scaling) [2], LPP (Locality Preserving Projections) [3]. With the nonlinear features of large data, manifold learning has increasingly attracted much attention for its good effect on nonlinear dimension reduction. Manifold learning has divided into two parts: one is the global algorithm, such as Isomap [4] which is a classical global algorithm; the other is the global algorithm, such as Laplacian Eigenmap [5], LLE [6-9], hessian features mapping algorithm. With the studying of LLE, Raducanu and Zhang [10-11] proposed a supervised Locally Linear Embedding without considering the similarity between samples. Zhang [12] proposed an algorithm of Mahalanobis Distance measurement based Locally Linear Embedding to improve the effect of dimension reduction but did not put the class information into consideration. Aiming at the difficulty of high-dimensional nonlinear data, this paper proposed a feature extraction which combines training sample label of sample data and the using of Mahalanobis Distance metric into the algorithm of locally linear embedding framework. It realizes the reduction of high-dimensional data to improve the accuracy of data classification. Then Extreme Learning Machine (ELM) [13-14] is used as mapping function fitting to reduce unlabeled data from high dimension space to low dimensional space effectively. II. LOCAL LYLINEAR EMBEDDING (LLE) LLE is mainly using local linear approximation global nonlinear to provide overall information: Given n D-dimensional vectors } ,..., , { 2 1 n x x x X = , d i R x as input, } ,..., , { 2 1 n y y y Y = , ) ( d m R y m i < as output, the LLE algorithm comprises of three steps Step 1:To each point i x ) ,..., 2 , 1 ( n i = in high dimension,it finds a predetermined number n of neighbors through the calculating of their Euclidean distance | | j i ij x x d = Step 2:To each points i x ) ,..., 2 , 1 ( n i = ,one computes the weights W ij that combined with the neighbors j x by minimizing the following cost function; ¦ ¦ = = = n i K j j ij i x W x W 1 2 1 ) ( ϕ (1) With the condition ¦ = = K j ij W 1 1 ,if j x ) ,..., 2 , 1 ( K j = is not a neighbor of i x ) ,..., 2 , 1 ( n i = , 0 = ij W Step 3 The value of y i and y j in d-dimension space is formed according to the weights between i x ) ,..., 2 , 1 ( n i = and its neighbor j x ) ,..., 2 , 1 ( K j = in D dimension space,with d<D,such that the cost function: 978-1-4799-2860-6/13/$31.00 ©2013 IEEE 76

Upload: kang-jian

Post on 24-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2013 3rd International Conference on Consumer Electronics, Communications and Networks (CECNet) - Xianning, China (2013.11.20-2013.11.22)] 2013 3rd International Conference on

An Algorithm Research of Supervised LLE Based On Mahalanobis Distance and

Extreme Learning Machine

Ling-Min He

College of information engineering China Jiliang University

Hangzhou, China

Wei Jin, Xiao-Bin Yang, Kang-Jian Wang

College of information engineering China Jiliang University

Hangzhou, China

Abstract—The Locally Linear Embedding (LLE) is one of the efficient nonlinear dimensionality reduction techniques. But for some high dimensional data, it is not taking the class information of the data into account and Euclidean distance can not accurately reflect the similarity among samples. The paper proposes an improved Supervised LLE which combines class labeled data and Mahalanobis Distance (MSP-LLE). First, the approach learns a Mahalanobis Distance from the existing data. Then the Mahalanobis Distance and label information are combined to choose neighborhoods. Finally, ELM is using to map the unlabeled data to the feature space, which easily implement fault pattern recognition. The experiment result shows its good performance on reduction and recognition for high-dimensional and similar data.

Keywords-locally linear embedding; extreme learning machine; Mahalanobis distance; supervised; reduction; recognition

I. INTRODUCTION Data type and amount in human society is growing and

cumulating in amazing speed which is caused by emerging new services such as blogs, social networks, and location-based services LBS, the era of big data has come. The data’s characteristic of high dimension makes the problem greatly complex. One of the effective methods to deal with high-dimensional data is realizing the dimension reduction to obtain the raw data compression and provide nature information for subsequent data analysis. In the process of dimension reduction, part of original data information will be lost. So how to maintain high dimensional data in original space after dimension reduction is particularly important.

So far, there have been a variety of linear dimension reduction methods, such as PCA (Principal Component Analysis) [1], MDS (Multidimensional Scaling) [2], LPP (Locality Preserving Projections) [3]. With the nonlinear features of large data, manifold learning has increasingly attracted much attention for its good effect on nonlinear dimension reduction. Manifold learning has divided into two parts: one is the global algorithm, such as Isomap [4] which is a classical global algorithm; the other is the global algorithm, such as Laplacian Eigenmap [5], LLE [6-9], hessian features mapping algorithm. With the studying of LLE, Raducanu and Zhang [10-11] proposed a supervised Locally Linear Embedding without considering the similarity between samples. Zhang [12] proposed an algorithm of Mahalanobis Distance

measurement based Locally Linear Embedding to improve the effect of dimension reduction but did not put the class information into consideration.

Aiming at the difficulty of high-dimensional nonlinear data, this paper proposed a feature extraction which combines training sample label of sample data and the using of Mahalanobis Distance metric into the algorithm of locally linear embedding framework. It realizes the reduction of high-dimensional data to improve the accuracy of data classification. Then Extreme Learning Machine (ELM) [13-14] is used as mapping function fitting to reduce unlabeled data from high dimension space to low dimensional space effectively.

II. LOCAL LYLINEAR EMBEDDING (LLE) LLE is mainly using local linear approximation global

nonlinear to provide overall information:

Given n D-dimensional vectors },...,,{ 21 nxxxX = , di Rx ∈

as input, },...,,{ 21 nyyyY = , )( dmRy mi <∈ as output, the LLE

algorithm comprises of three steps

Step 1:To each point ix ),...,2,1( ni = in high dimension,it finds a predetermined number n of neighbors through the calculating of their Euclidean distance || jiij xxd −=

Step 2:To each points ix ),...,2,1( ni = ,one computes the

weights Wij that combined with the neighbors jx by minimizing the following cost function;

= =

−=n

i

K

jjiji xWxW

1

2

1

)(ϕ (1)

With the condition =

=K

jijW

1

1,if jx ),...,2,1( Kj = is not a

neighbor of ix ),...,2,1( ni = , 0=ijW

Step 3 The value of yi and yj in d-dimension space is formed according to the weights between ix ),...,2,1( ni = and

its neighbor jx ),...,2,1( Kj = in D dimension space,with d<D,such that the cost function:

978-1-4799-2860-6/13/$31.00 ©2013 IEEE76

Page 2: [IEEE 2013 3rd International Conference on Consumer Electronics, Communications and Networks (CECNet) - Xianning, China (2013.11.20-2013.11.22)] 2013 3rd International Conference on

= =

−=n

i

K

jjiji yWyY

1

2

1)(ψ (2)

Subject to =

=n

ijy

10 and Iyy

n

n

i

Tii =

=1

1 , is minimized.

III. LEARN A MAHALANOBIS DISTANCE Xiang[15] proposed a similarity measure based on

Mahalanobis Distance, suppose we are given a data set which contains N samples, Nxxxx …321 ,, , NiRx n

i …=∈ ,2,1, . Our goal is to learn a Mahalanbis distance from the exising N samples. In the process of learning, cost function is defined as :

)()(

max argm

*

mwT

m

mbT

m

IWWm WSWtr

WSWtrW

mT =

= (3)

to maximize the distance between the different samples and minimize the similar sample.

Xiang introduced a monotonic function about λ in the algorithm:

))((max)( mwbT

mIWW

WSSWtrgm

Tm

λλ −==

(4)

Then let )()(

1w

b

StrStr=λ is minimum value of λ and

=

==d

ii

d

ii

1

12

β

αλ

is maximum value of λ . When the denominator of

formula(4) is zero, it used nonsingular method on SW to get the optimal solution,the algorithm as follows:

Step 1 Define the low dimensionality d,calculate the rank r of the matrix SW,if d>n-r, then go to Step 2,else go to Step 7

Step 2 2

21 λλλ +=

Step 3 If ελλ >− 12 (ε is the threshold value), then go to Step 4,else go to Step 6.

Step 4 Calculates )(λg by formula (4)

Step 5 If 0)( >λg ,then λλ =1 ,else λλ =2 , go to Step 2.

Step 6 ]...[ 21 dmW μμμ= , where d21, μμμ … are the d

eigenvectors, corresponding to the d largest eigenvalues of

wb SS λ− . End algorithm.

Step 7 ]...[ 21 dm vvvZW = , Here d21, ννν … are d eigenvecors corresponding to the d largest eigenvaluesof

ZSZ bT and ]...[ 21 rnzzzZ −= are the eigenvectors

corresponding to n-r zero eigenvalues of Sw. End algorithm.

IV. THE PRINCIPLE OF MSP-LLE This section introduces the overall framework of MSP-LLE

based in detail. The MSP-LLE algorithm differs from LLE in two aspects.

(1)Compared with the LLE in the first step,MSP-LLE specifically computes the n nearest neighbors for each data point according to the Mahalanobis Distance. The process mainly learns a Mahalanbis Distance from the existing samples. According to the optimal matrix Wm in Section 3, calculate the Mahalanbis matrix T

mmWWA = which Wm is an n*d matrix.

(2) During the process of mapping the data from high dimension space RD to low dimension space Rd,the structure of dataset and class information are both taken into account based on the Geodesic Distance to increase the classification accuracy in MSP-LLE algorithm.

Combining the above two points, the implementation steps of the MSP-LLE can be described as following.

Step 1: Construct the neighbor graph G of D,it computes the Mahalanbis Distance between any pair of points to get distance matrix AD.

Step 2: In the distance matrix AD, according to the class information, the dissimilarity distance between two point ix

and jx is defined as

)max(),(),( Dxxdxxd jimjiG σ+= (5)

where ),( jiG xxd is the Mahalanbis distance between ix

and jx , ),( jis xxd is the Mahalanbis distance between ix

and jx ,max(D) is the max Mahalanbis distance between all

points classes.,σ is the distance parameter, ]1,0[∈σ . Then calculates the k closest neighbors of each Xi in AD with K-Nearest Neighbor (KNN).

Step 3 To each points ix ),...,2,1( ni = ,one computes the

weights Wij that combined with the neighbors jx by minimizing the following cost function

= =

−=n

i

K

jjiji xWxW

1

2

1)(ϕ (6)

With the condition =

=K

jijW

11 ,using maximum likelihood

method to find out the optimal coefficientT

ikii wwww ]...[ 21= ,thus

= =

=

=K

p

K

q

ipq

K

m

ijm

ij

Q

Qw

1 1

1)(

1

1)(

)(

)( (7)

77

Page 3: [IEEE 2013 3rd International Conference on Consumer Electronics, Communications and Networks (CECNet) - Xianning, China (2013.11.20-2013.11.22)] 2013 3rd International Conference on

where )()()(mi

Tji

ijm xxAxxQ −−= .

Step 4 The value of yi and yj in d dimension space is formed according to the weights between ix ),...,2,1( ni = and

its neighbor jx ),...,2,1( Kj = in D dimension space,with d<D, minimum the following cost function

))(1

2

1= =

−=n

i

K

jjiji yWyYψ (8)

Let =

=n

ijy

10

and Iyy

n

n

i

Tii =

=1

1

,

Where I is a dd ∗ matrix found as )()( WIAWIM T −−= , the optimal d-dimensional embedding

vectors ]...[ 21 NyyyY = are obtained by computing the bottom d+1 eigenvectors of the matrix M.

V. USE ELM TO MAP UNLABELED DATA In virtue of the method’s useless reduction for unlabeled

data, the paper proposes the method of combining the extreme learning machine (ELM) and MSP-LLE algorithm to reduction and classification, the method can be divided into the following three steps:

(1) During the reduction of labeled data from high dimension space D

N RxxX ⊂= },,{ 1 to low dimension space d

N RyyY ⊂= },,{ 1 ,the mapping net dDNet → which dD RR → is trained through setting X as input and Y as the

output in the ELM.

(2) Train the mapping net dDNet → from Y to C through setting Y as input and its label C as output.

(3) Put unlabeled x of high dimension space into the mapping net dDNet → to get y in low dimension space, then put y into the mapping net to get its label.

VI. EXPERIMENTS AND DISCUSSION The experiment selects classic datasets of Iris and Wine

from UCI standard database. Iris includes three classes: Setosas, Virginicas, Versicolors, each has 50 samples and each of the samples has four features. Wine has a total of three classes and 178 samples. Also experiment will select classic gene data of SRBCT, Colon from biomedical databases of Kent Ridge, which possesses the characteristic of high dimensions and similarity to detect the performance of the manifold learning algorithm effectively. Then datasets descriptions and parameter constitution of the algorithm are shown in Table 1.

TABLE I. THE DATASETS DESCRIPTIONS AND EXPERIMENTAL SETTINGS

Dataset Num Class(C) High-D Low-d KWine 178 3 13 2 8 Iris 150 3 4 2 8

SRBCT 83 4 2309 2 8 Colon 62 2 2000 2 8

For comparing the effectiveness of the proposed method, experiments are also performed on MLLE (The LLE based on Mahalanbis Distance) as well as SPLLE (The supervised LLE). Firstly, in order to ensure that all variables local in the same range, linear transformation function (See Formula 9):

y=(x-MinValue)/(MaxValue-MinValue) (9)

is used to normalize the datasets before the neighbor-points selection with KNN (K-Nearest Neighbors).Then map the data to two dimension by SPLLE,MLLE and MSP-LLE algorithm. The effect reduction of the methods are shown in Fig 1~Fig2.

-3 -2 -1 0 1 2 3 4 5-1

-1

-1

-1

-1

-1

-1

-1

-1wine

mlle

c1c2c3

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8Iris

mlle

setosaversicolorvirginica

-5 -4 -3 -2 -1 0 1 2 3-1

-1

-1

-1

-1

-1

-1wine

splle

c1c2c3

0.2 0.4 0.6 0.8 1 1.2 1.4

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

Iris

splle

setosaversicolorvirginica

-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5-1

-1

-1

-1

-1

-1

-1

-1wine

msp-lle

c1c2c3

-1.5 -1 -0.5 0 0.5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Iris

msp-lle

setosaversicolorvirginica

Figure 1. Wine and Iris

78

Page 4: [IEEE 2013 3rd International Conference on Consumer Electronics, Communications and Networks (CECNet) - Xianning, China (2013.11.20-2013.11.22)] 2013 3rd International Conference on

-2 -1 0 1 2 3 4 5 6-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1SRBCT

mlle

c1c2c3c4

-1 -0.5 0 0.5 1 1.5 21

1

1

1

1

1

1

1colon

mlle

NegtivePositive

-6 -5 -4 -3 -2 -1 0 11

1

1

1

1

1

1

1SRBCT

splle

c1c2c3c4

-1.5 -1 -0.5 0 0.5 1 1.5 2 2.51

1

1

1

1

1

1

1colon

splle

NegtivePositive

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

0

0.5

1

1.5

2

SRBCT

msp-lle

c1c2c3c4

-2.5 -2 -1.5 -1 -0.5 0 0.5 1-1

-1

-1

-1

-1

-1

-1

-1colon

msp-lle

NegtivePositive

Figure 2. SRBCT and Colon

TABLE II. THE CLASSIFICATION RATES OF EACH METHOD TO THE DATASETS WITH ELM CLASSIFIER

SPLLE MLLE MSP-LLEWine 72.99% 71.81% 92.20% Iris 56.80% 60.34% 85.77%

SBRCT 41.50% 63.33% 91.87% Colon 83.76% 90.01% 98.31%

In Fig. 1, data sets of Iris and Wine are both reduced to two dimensions effectively by three methods for their original low dimension, but the proposed method has less overlapping points than other methods obviously. From Fig. 2 we can see, for some datasets such as SBRCT which has great similarity among the samples and high dimensions, MLLE and SPLLE have some misjudgment points and boundaries of chaos, sometimes can’t effectively success in pattern classification. MSP-LLE has low sensitivity to high-dimensional feature. So after reducing data to a two-dimensional plane, it has a clear classification boundaries and less overlapping parts to improve the classification accuracy greatly.

After feature reduction with MLLE, SPLLE and MSP –LLE, the ELM (Extreme Learning Machine) classifier is adopted for final recognition for pattern recognition. As shown in table 2, it is concluded that the test accuracy consistent with the effect of dimension reduction and highest test accuracy can reach 98.75%. Furthermore, part of the testing accuracy is much higher than MLLE, SPLLE.

VII. CONCLUSION In this study, we show that the employment of LLE with

Mahalanbis Distance and label information (MSP-LLE) can realize reduction more effectively on datasets which has the property of similarity among samples as compared to LLE with Mahalanbis Distance (MLLE) and the supervised LLE (SPLLE), it can be an effective method for pattern recognition. Future work will focus on the comparison with MSP-LLE and MLLE the data sets with particular regard to image data.

ACKNOWLEDGEMENTS This paper is supported by the National Natural Science

Foundation of China (Grant No.61100160).

REFERENCES [1] J.Pacheco, S.Casado, S.Porras, "Exact methods for variable selection in

principal component analysis:Guide functions and pre-selection, "Computational Statistics and Data Analysis, vol.57, 2013, pp.95-111.

[2] M.Oh, "A simple and efficient Bayesian procedure for selecting dimensionality in multidimensional scaling, "Journal of Multivariate Analysis, vol.107, May 2012, pp.200-209.

[3] S.Chen, H.F.Zhao, M.Kong, P.Luo, "2D-LPP: A two-dimensional extension of locality preserving projections, "Neurocomputing, vol.70, Jan 2007, pp.912-921.

[4] M.Cho, H.Y.Park, "Nonlinear dimension reduction using ISOMap based on class information, "Proceedings of the 2009 international joint conference Neural Networks, Atlanta, Georgia, USA, Jun 2009, pp.28030-2834.

[5] C.Chen, L.J.Zhang, J.J.Bu, C.Wang, W.Chen, "Constrained Laplacian Eigenmap for dimensionality reduction, "Neurocomputing, vol.73, Jan 2010, pp.951-958.

[6] J.Chen, Y.Liu, "Locally linear embedding : a survey, "Artificial Intelligence Review, vol.36, Jun 2011, pp.29-48.

[7] G.Daza-Santacoloma, G.Castellanos-Dominguez, J.C.Principe, "Locally linear embedding based on correntropy measure for visualization and classification, "Neurocomputing, vol.80, Mar 2012, pp.19-30.

[8] G.Daza-Santacoloma, C.D.Acosta-Medina, G.Castellanos-Dominguez, "Regularization parameter choice in locally linear embedding, "Neurocomputing, vol.73, pp.1595-1605.

[9] B.Alipanahi, A.Ghodsi, "Guided Locally Linear Embedding, "Pattern Recognition Letters, vol.32, May 2011, pp.1029-1035.

[10] S.Q.Zhang, "Enhanced supervised locally linear embedding, "Pattern Recognition Letters, vol.30, Oct 2009, pp.1208-1218.

[11] B.Raducanu, F.Dornaike, "A supervised non-linear dimensionality reduction approach for manifold learning, "Pattern Recognition, vol.45, June 2012, pp:2432-2422.

[12] X.F.Zhang, S.B.Huang, "Mahalanobis Distance Measurement Based Locally Linear Embedding Algorithm, "Pattern Recognition and Artificial Intelligence, vol.25, April 2012, pp:318-324.

[13] G.B.Huang, Q.Y.Zhu, C.K.Siew, “Extreme learning machine: theory and applications, " Neurocomputing, 2006, vol.70, pp. 489-501.

[14] G.B.Huang, L, Chen."Enhanced random search based incremental extreme learning machine, " Neurocomputing, 2008, pp.3060-3068.

[15] Xiang Shiming, etc, "Learning a Mahalanobis Distance Metric for Data Clustering and Classification, " Pattern Recognition, 2008, vol.41, pp.3600-3612.

79