[ieee 2013 third international conference on intelligent system design and engineering applications...

A Fast and Effective Kernel-Based K-Means Clustering Algorithm KONG Dexi, KONG Rui

Jinan University, Guangzhou, Guangdong, 510632, China [email protected]

Abstract-In the paper, we applied the idea of kernel-based

learning methods to K-means clustering. We propose a fast

and effective algorithm of kernel K-means clustering. The idea

of the algorithm is that we firstly map the data from their

original space to a high dimensional space (or kernel space)

where the data are expected to be more separable. Then we

perform K-means clustering in the high dimensional kernel

space. Meanwhile we improve speed of the algorithm by using

a new kernel function---conditionally positive definite kernel

(CPD). The performance of new algorithm has been

demonstrated to be superior to that of K-means clustering

algorithm by our experiments on artificial and real data.

Keywords-Kernel K-Means Clustering; K-Means Clustering;

Kernel Function; Support Vector Machines

INTRODUCTION

In the last year, a number of powerful kernel-based learning machines have been proposed, such as support vector machines[1-3] (SVM). Those approaches have been applied not only in classification and regression but also in unsupervised learning. All of those approaches are employ kernel function to increase the separability of data. Generally speaking, kernel function implicitly defines a non-linear transformation that maps the data from their original space to a high dimensional space (kernel space) where the data are expected to more separable.

Clustering is unsupervised learning algorithm that partition the data set into a number of clusters under some optimization measure. K-means cluster is an unsupervised learning algorithm. The algorithm parts data into different classes by minimizing the sum of squares of the Euclidean distance between the samples and their center of classes. The assumption of K-means cluster is the data are scattered in elliptical region. If the separation boundaries

between classes are nonlinear then K-means cluster algorithm will fail. Enlightening by kernel-based learning algorithm[4,5], we adopt the strategy that mapping the data of original space into a high dimension kernel space where performing cluster. If the nonlinear mapping is smooth and continuous then the topographic ordering of data in kernel space will be preserved, so that the samples clustered together in original space will be more clustered in kernel space. The kernel K-means algorithm of reference [6,7] use Mercer kernel functions which consume a great number of time. Our kernel K-means algorithm uses a new kernel function which can reduce calculate cost of algorithm.

Section reconsiders K-means cluster algorithm. Section presents our kernel K-means algorithm. Section

provides some demonstrative simulations. Section provides conclusion.

. K-MEANS CLUTERING ALGORITHM

Suppose we are given a set of unsigned samples ( )Nxxx ,,, 21 � , the aim of K-means clustering

algorithm is parting the samples into K classes ( )KCCC ,,, 21 � . The center of class denotes by kN , so

�=

=K

kkNN

1

1

The mean of class denotes ( )Kmmm �,, 21 ,

�=

=kN

ii

kk x

Nm

1

1 where Kk ,,1�= 2

K-means clustering algorithm is minimizing the object function below.

��= =

−=K

k

N

iki

k

mxJ1 1

2 3

K-means clustering algorithm procedure: (1) initialization parting N samples into K classes, computing the values

of ( )Kmmm �,, 21 and J ;

(2) parting ix to the classes where the distance of between ix and center of

2013 Third International Conference on Intelligent System Design and Engineering Applications

978-0-7695-4923-1/12 $26.00 © 2012 IEEE

DOI 10.1109/ISDEA.2012.21

58

classes is least. Then computing again the values of ( )Kmmm �,, 21

and J ;

(3) repeating step (2), until the value of J do not change or converge;

(4) return ( )Kmmm �,, 21 ,end.

. FAST AND EFFECTIVE KERNEL-BASED K-MEANS CLUSTERING ALGORITHM

As linear function classes are often not rich enough in practice. Kernel-based learning algorithm make use of the following idea:

via a nonlinear mapping ( )xxFRn φφ →→ ,: , the

data ( ) nN Rxxx ∈,,, 21 � is mapped into a potentially much

higher dimensional feature space F (kernel space). We can execute the same algorithm in F instead of nR , i.e., one works with the samples ( ) ( )( ) Fxx N ∈φφ ,,1 � .

Kernel-based K-means clustering algorithm is based on above idea which executes K-means clustering in kernel space. The key issue of Kernel-based K-means clustering algorithm is the computation of distance in the kernel space. The Euclidean distance between samples in kernel space is written as:

( ) ( ) ( ) ( )( )ji

jjiiji

xxk

xxkxxkxx

,2

,,

Mercer

MercerMercer

2

−

+=−φφ (4)

where ( )ii xxk ,Mercer is known as Mercer kernel function.

So the computation of distance in the kernel space can be executed by kernel functions. There are several kinds of Mercer kernel functions can be used, such as:

Gaussian RBF ( ) ( )22exp, σyxyxk −−= 5

Polynomial ( ) ( )dyxyxk ⋅=, 6

Sigmoidal ( ) ( )( )θκ +⋅= yxyxk , 7

The computation cost of kernel-based K-means clustering algorithm is mainly the computation of distance between samples. If using Mercer kernel functions above, the computation cost of distance between samples will great. In the paper, we adopt a new kind of kernel function which comes from regularization theory, known as conditionally

positive definite kernel (CPD). Definition 1 (Positive definite kernel) [8]A symmetric function RXXk →×: which for all XxNn i ∈∈ , gives rise to a positive Gram matrix, i.e. for which for all Rci ∈ we

have

�=

≥n

jiijji Kcc

1,

,0 where ( )jiij xxkK ,= (8)

is called a positive definite (PD) kernel. Definition 2 (Conditionally positive definite kernel) [8] A symmetric function RXXk →×: which satisfy (8) for all XxNn i ∈∈ , and all Rci ∈ with

�=

=n

iic

1

0 9

is called a conditionally positive definite(CPD) kernel. There is a CPD kernel below which is be used in

kernel-based K-means clustering algorithm instead of Mercer kernel functions.

( ) 1, +−−= qCPD yxyxk 20 ≤< q 10

So, the Euclidean distance between samples in kernel space is changed as:

( ) ( ) ( ) q

jijiCPDji xxxxkxx −=−=− ,2

φφ Nji ,,1, �=

(11) Comparing formula (11) and (4),we can known that

computation cost of distance between samples is greatly decreased. Our algorithm can reduce the computation cost and improves speed.

In kernel space, the center of class is denote as :

( )�=

=kN

ii

kk x

Nm

1

1 φφ 12

where Ni ,,1�= , Kk ,,1�= , kN is the numbers of k -th

class. Kernel-based K-means clustering algorithm is

minimizing the object function below.

( )��= =

−=K

k

N

iki

k

mxJ1 1

2φφ φ 13

Kernel-based K-means clustering algorithm procedure: (1) initialization parting N samples into K classes, selecting a sample as

the center in each class, the values of ( )φφφKmmm �,, 21 are just the

59

sample’s, computation the value of φJ ;

(2) parting ( )ixφ to the classes where the distance of between ( )ixφ and

center of classes is least. Then computing again the values of

( )φφφKmmm �,, 21 and φJ ;

As ( )φφφKmmm �,, 21 can not be easy compute in kernel space, we

select a sample as the center of each class in turn, through computing the

sum of distance of between ( )ixφ and the center of each class, the sample

which has the least sum of distance is just the center of class. The least sum

of distance is just φJ .

(3) repeating step (2), until the value of φJ do not change or converge;

(4) return ( )φφφKmmm �,, 21 ,end.

.SIMULATIONS In order to validate our algorithm, we execute some

cluster experiments on artificial data and practical data. Mean while, compare the performance of our algorithm and K-means clustering algorithm, algorithms of reference [6,7] which use Mercer kernel as well as. In experiments, our algorithm selects a CPD kernel function ( 0.1=q ),

algorithms of reference [6,7] select a Mercer kernel function (Gaussian RBF, 1.0=σ ).

We make artificial data which are consist of three classes Gaussian distribution samples (each have 30 samples). We execute clustering experiments by using K-means clustering, our algorithm and algorithms of reference [6,7]. The experiments of each algorithm are carry out five times. K-means clustering algorithms converge iteratively after six times. Kernel-based K-means clustering algorithms converge iteratively after two times. The time cost of our algorithm is quarter of algorithms of reference [6,7].

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Figure 1. outcome of k-means clustering

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8• 1• 2• 3

Figure 2. outcome of kernel-based k-means clustering

We make another artificial data which are consist of two

class circularity distribution samples (out circularity have 80 samples and in circularity have 40 samples). The experiments of each algorithm are carry out five times. K-means clustering algorithms converge iteratively after eight times. Kernel-based K-means clustering algorithms converge iteratively after three times. The time cost of our algorithm is one third of algorithms of reference [6,7].

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9• 1• 2

Figure 3. outcome of k-means clustering

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9• 1• 2


The outcomes of clustering show that the K-means clustering algorithm can not cluster correctly, our algorithm

60

and algorithms of reference [6,7] can cluster correctly. Mean while our algorithm consumes less time.

At last, we use IRIS data which are consist of three class samples (each class have fifty samples.) for experiments. We select two features namely the petal length and petal width in experiments. K-means clustering algorithm converges iteratively after twelve times and produces nine error samples. Kernel-based K-means clustering algorithms converge iteratively after five times and produce five error samples. The time cost of our algorithm is one sixth of algorithms of reference [6,7].

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-1.5

-1

-0.5

0

0.5

1

1.5

2

petal length

peta

l wid

th

• 1• 2• 3

Figure 5. outcome of K-means clustering

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-1.5

-1

-0.5

0

0.5

1

1.5

2

petal length

peta

l wid

th

• 1• 2• 3


.CONCLUSION

This paper has explored data clustering algorithm. Based on Statistical learning theory and SVM classification algorithm idea, we propose a fast and effective algorithm of kernel K-means clustering. The experiments have validated effective of our algorithm. The clustering performance of our algorithm is superior to that of K-means clustering algorithm. The converging speed of our algorithm is faster

than that of Mercer kernel-based K-means clustering algorithm. In follow researches, we intend to apply our algorithm into large scale data clustering for validating the effective of our algorithm.

ACKNOWLEDGE

This research is partially sponsored by STFC of Guangdong under contract

No. 2008B090500185 and No. 2011B050102010.

REFERENCES [1] Vladimir N. Vapnik, “The Nature of Statistical Learning Theory”.

Springer Verlag New York 1995

[2] K.Muller, S.Mika, G.Ratsch, K.Tsuda, and B.Scholkopf, “An

Introduction to Kernel-Based Learning Algorithms” IEEE

Transactions on Neural Networks. 2001, 12( 2):181-201.

[3] C. J. C. Burges, “A tutorial on support vector machines for pattern

recognition,” Knowledge Discovery and Data Mining, 1998, 2(2):

121–167.

[4] A. J. Smola, “Learning with kernels,” Ph.D. dissertation, Technische

Universität Berlin, 1998.

[5] B. Schölkopf, S. Mika, C. J. C. Burges, P. Knirsch, K.-R. Müller, G.

Rätsch, and A. J. Smola, “Input space versus feature space in kernel

based methods,” IEEE Transactions on Neural Networks,

1999,10:1000-1017.

[6] Mark Girolami, “Mercer Kernel-Based Clustering in Feature Space”,

IEEE Transactions on Neural Networks, 2002,13(3):780-784.

[7] Meena Tushir, Smriti Srivastava, “A new Kernelized hybrid c-mean

clustering model with optimized parameters”, Applied Soft Computing

2010 (10): 381-389.

[8]Bernhard Schölkopf, “The Kernel Trick for Distances”, Technical

Report MSR-TR-2000-51, 19 May 2000.

KONG Dexi was born in Hefei, Anhui on July 25, 1991. He (2009050283)

is currently pursuing bachelor degree in statistics department of college of

economics, Jinan University. His research interests include machine

learning and statistics modeling.

KONG Rui received the Doctor of Engineering in Signal and Information

Processing from University of Science and Technology of China. He

currently serves as associate professor in College of Electrical and

Information, Jinan University. His research interests include machine

learning and pattern recognition. E-mail: [email protected].

61

[ieee 2013 third international conference on intelligent system design and engineering applications...

Documents