[ieee 2013 third international conference on intelligent system design and engineering applications...
TRANSCRIPT
A Fast and Effective Kernel-Based K-Means Clustering Algorithm KONG Dexi, KONG Rui
Jinan University, Guangzhou, Guangdong, 510632, China [email protected]
Abstract-In the paper, we applied the idea of kernel-based
learning methods to K-means clustering. We propose a fast
and effective algorithm of kernel K-means clustering. The idea
of the algorithm is that we firstly map the data from their
original space to a high dimensional space (or kernel space)
where the data are expected to be more separable. Then we
perform K-means clustering in the high dimensional kernel
space. Meanwhile we improve speed of the algorithm by using
a new kernel function---conditionally positive definite kernel
(CPD). The performance of new algorithm has been
demonstrated to be superior to that of K-means clustering
algorithm by our experiments on artificial and real data.
Keywords-Kernel K-Means Clustering; K-Means Clustering;
Kernel Function; Support Vector Machines
INTRODUCTION
In the last year, a number of powerful kernel-based learning machines have been proposed, such as support vector machines[1-3] (SVM). Those approaches have been applied not only in classification and regression but also in unsupervised learning. All of those approaches are employ kernel function to increase the separability of data. Generally speaking, kernel function implicitly defines a non-linear transformation that maps the data from their original space to a high dimensional space (kernel space) where the data are expected to more separable.
Clustering is unsupervised learning algorithm that partition the data set into a number of clusters under some optimization measure. K-means cluster is an unsupervised learning algorithm. The algorithm parts data into different classes by minimizing the sum of squares of the Euclidean distance between the samples and their center of classes. The assumption of K-means cluster is the data are scattered in elliptical region. If the separation boundaries
between classes are nonlinear then K-means cluster algorithm will fail. Enlightening by kernel-based learning algorithm[4,5], we adopt the strategy that mapping the data of original space into a high dimension kernel space where performing cluster. If the nonlinear mapping is smooth and continuous then the topographic ordering of data in kernel space will be preserved, so that the samples clustered together in original space will be more clustered in kernel space. The kernel K-means algorithm of reference [6,7] use Mercer kernel functions which consume a great number of time. Our kernel K-means algorithm uses a new kernel function which can reduce calculate cost of algorithm.
Section reconsiders K-means cluster algorithm. Section presents our kernel K-means algorithm. Section
provides some demonstrative simulations. Section provides conclusion.
. K-MEANS CLUTERING ALGORITHM
Suppose we are given a set of unsigned samples ( )Nxxx ,,, 21 � , the aim of K-means clustering
algorithm is parting the samples into K classes ( )KCCC ,,, 21 � . The center of class denotes by kN , so
�=
=K
kkNN
1
1
The mean of class denotes ( )Kmmm �,, 21 ,
�=
=kN
ii
kk x
Nm
1
1 where Kk ,,1�= 2
K-means clustering algorithm is minimizing the object function below.
��= =
−=K
k
N
iki
k
mxJ1 1
2 3
K-means clustering algorithm procedure: (1) initialization parting N samples into K classes, computing the values
of ( )Kmmm �,, 21 and J ;
(2) parting ix to the classes where the distance of between ix and center of
2013 Third International Conference on Intelligent System Design and Engineering Applications
978-0-7695-4923-1/12 $26.00 © 2012 IEEE
DOI 10.1109/ISDEA.2012.21
58
classes is least. Then computing again the values of ( )Kmmm �,, 21
and J ;
(3) repeating step (2), until the value of J do not change or converge;
(4) return ( )Kmmm �,, 21 ,end.
. FAST AND EFFECTIVE KERNEL-BASED K-MEANS CLUSTERING ALGORITHM
As linear function classes are often not rich enough in practice. Kernel-based learning algorithm make use of the following idea:
via a nonlinear mapping ( )xxFRn φφ →→ ,: , the
data ( ) nN Rxxx ∈,,, 21 � is mapped into a potentially much
higher dimensional feature space F (kernel space). We can execute the same algorithm in F instead of nR , i.e., one works with the samples ( ) ( )( ) Fxx N ∈φφ ,,1 � .
Kernel-based K-means clustering algorithm is based on above idea which executes K-means clustering in kernel space. The key issue of Kernel-based K-means clustering algorithm is the computation of distance in the kernel space. The Euclidean distance between samples in kernel space is written as:
( ) ( ) ( ) ( )( )ji
jjiiji
xxk
xxkxxkxx
,2
,,
Mercer
MercerMercer
2
−
+=−φφ (4)
where ( )ii xxk ,Mercer is known as Mercer kernel function.
So the computation of distance in the kernel space can be executed by kernel functions. There are several kinds of Mercer kernel functions can be used, such as:
Gaussian RBF ( ) ( )22exp, σyxyxk −−= 5
Polynomial ( ) ( )dyxyxk ⋅=, 6
Sigmoidal ( ) ( )( )θκ +⋅= yxyxk , 7
The computation cost of kernel-based K-means clustering algorithm is mainly the computation of distance between samples. If using Mercer kernel functions above, the computation cost of distance between samples will great. In the paper, we adopt a new kind of kernel function which comes from regularization theory, known as conditionally
positive definite kernel (CPD). Definition 1 (Positive definite kernel) [8]A symmetric function RXXk →×: which for all XxNn i ∈∈ , gives rise to a positive Gram matrix, i.e. for which for all Rci ∈ we
have
�=
≥n
jiijji Kcc
1,
,0 where ( )jiij xxkK ,= (8)
is called a positive definite (PD) kernel. Definition 2 (Conditionally positive definite kernel) [8] A symmetric function RXXk →×: which satisfy (8) for all XxNn i ∈∈ , and all Rci ∈ with
�=
=n
iic
1
0 9
is called a conditionally positive definite(CPD) kernel. There is a CPD kernel below which is be used in
kernel-based K-means clustering algorithm instead of Mercer kernel functions.
( ) 1, +−−= qCPD yxyxk 20 ≤< q 10
So, the Euclidean distance between samples in kernel space is changed as:
( ) ( ) ( ) q
jijiCPDji xxxxkxx −=−=− ,2
φφ Nji ,,1, �=
(11) Comparing formula (11) and (4),we can known that
computation cost of distance between samples is greatly decreased. Our algorithm can reduce the computation cost and improves speed.
In kernel space, the center of class is denote as :
( )�=
=kN
ii
kk x
Nm
1
1 φφ 12
where Ni ,,1�= , Kk ,,1�= , kN is the numbers of k -th
class. Kernel-based K-means clustering algorithm is
minimizing the object function below.
( )��= =
−=K
k
N
iki
k
mxJ1 1
2φφ φ 13
Kernel-based K-means clustering algorithm procedure: (1) initialization parting N samples into K classes, selecting a sample as
the center in each class, the values of ( )φφφKmmm �,, 21 are just the
59
sample’s, computation the value of φJ ;
(2) parting ( )ixφ to the classes where the distance of between ( )ixφ and
center of classes is least. Then computing again the values of
( )φφφKmmm �,, 21 and φJ ;
As ( )φφφKmmm �,, 21 can not be easy compute in kernel space, we
select a sample as the center of each class in turn, through computing the
sum of distance of between ( )ixφ and the center of each class, the sample
which has the least sum of distance is just the center of class. The least sum
of distance is just φJ .
(3) repeating step (2), until the value of φJ do not change or converge;
(4) return ( )φφφKmmm �,, 21 ,end.
.SIMULATIONS In order to validate our algorithm, we execute some
cluster experiments on artificial data and practical data. Mean while, compare the performance of our algorithm and K-means clustering algorithm, algorithms of reference [6,7] which use Mercer kernel as well as. In experiments, our algorithm selects a CPD kernel function ( 0.1=q ),
algorithms of reference [6,7] select a Mercer kernel function (Gaussian RBF, 1.0=σ ).
We make artificial data which are consist of three classes Gaussian distribution samples (each have 30 samples). We execute clustering experiments by using K-means clustering, our algorithm and algorithms of reference [6,7]. The experiments of each algorithm are carry out five times. K-means clustering algorithms converge iteratively after six times. Kernel-based K-means clustering algorithms converge iteratively after two times. The time cost of our algorithm is quarter of algorithms of reference [6,7].
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Figure 1. outcome of k-means clustering
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8• 1• 2• 3
Figure 2. outcome of kernel-based k-means clustering
We make another artificial data which are consist of two
class circularity distribution samples (out circularity have 80 samples and in circularity have 40 samples). The experiments of each algorithm are carry out five times. K-means clustering algorithms converge iteratively after eight times. Kernel-based K-means clustering algorithms converge iteratively after three times. The time cost of our algorithm is one third of algorithms of reference [6,7].
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9• 1• 2
Figure 3. outcome of k-means clustering
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9• 1• 2
Figure 4. outcome of kernel-based k-means clustering
The outcomes of clustering show that the K-means clustering algorithm can not cluster correctly, our algorithm
60
and algorithms of reference [6,7] can cluster correctly. Mean while our algorithm consumes less time.
At last, we use IRIS data which are consist of three class samples (each class have fifty samples.) for experiments. We select two features namely the petal length and petal width in experiments. K-means clustering algorithm converges iteratively after twelve times and produces nine error samples. Kernel-based K-means clustering algorithms converge iteratively after five times and produce five error samples. The time cost of our algorithm is one sixth of algorithms of reference [6,7].
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-1.5
-1
-0.5
0
0.5
1
1.5
2
petal length
peta
l wid
th
• 1• 2• 3
Figure 5. outcome of K-means clustering
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-1.5
-1
-0.5
0
0.5
1
1.5
2
petal length
peta
l wid
th
• 1• 2• 3
Figure 6. outcome of kernel-based k-means clustering
.CONCLUSION
This paper has explored data clustering algorithm. Based on Statistical learning theory and SVM classification algorithm idea, we propose a fast and effective algorithm of kernel K-means clustering. The experiments have validated effective of our algorithm. The clustering performance of our algorithm is superior to that of K-means clustering algorithm. The converging speed of our algorithm is faster
than that of Mercer kernel-based K-means clustering algorithm. In follow researches, we intend to apply our algorithm into large scale data clustering for validating the effective of our algorithm.
ACKNOWLEDGE
This research is partially sponsored by STFC of Guangdong under contract
No. 2008B090500185 and No. 2011B050102010.
REFERENCES [1] Vladimir N. Vapnik, “The Nature of Statistical Learning Theory”.
Springer Verlag New York 1995
[2] K.Muller, S.Mika, G.Ratsch, K.Tsuda, and B.Scholkopf, “An
Introduction to Kernel-Based Learning Algorithms” IEEE
Transactions on Neural Networks. 2001, 12( 2):181-201.
[3] C. J. C. Burges, “A tutorial on support vector machines for pattern
recognition,” Knowledge Discovery and Data Mining, 1998, 2(2):
121–167.
[4] A. J. Smola, “Learning with kernels,” Ph.D. dissertation, Technische
Universität Berlin, 1998.
[5] B. Schölkopf, S. Mika, C. J. C. Burges, P. Knirsch, K.-R. Müller, G.
Rätsch, and A. J. Smola, “Input space versus feature space in kernel
based methods,” IEEE Transactions on Neural Networks,
1999,10:1000-1017.
[6] Mark Girolami, “Mercer Kernel-Based Clustering in Feature Space”,
IEEE Transactions on Neural Networks, 2002,13(3):780-784.
[7] Meena Tushir, Smriti Srivastava, “A new Kernelized hybrid c-mean
clustering model with optimized parameters”, Applied Soft Computing
2010 (10): 381-389.
[8]Bernhard Schölkopf, “The Kernel Trick for Distances”, Technical
Report MSR-TR-2000-51, 19 May 2000.
KONG Dexi was born in Hefei, Anhui on July 25, 1991. He (2009050283)
is currently pursuing bachelor degree in statistics department of college of
economics, Jinan University. His research interests include machine
learning and statistics modeling.
KONG Rui received the Doctor of Engineering in Signal and Information
Processing from University of Science and Technology of China. He
currently serves as associate professor in College of Electrical and
Information, Jinan University. His research interests include machine
learning and pattern recognition. E-mail: [email protected].
61