a density peak clustering algorithm based on the k-nearest...

14
Research Article A Density Peak Clustering Algorithm Based on the K-Nearest Shannon Entropy and Tissue-Like P System Zhenni Jiang, 1 Xiyu Liu , 1 and Minghe Sun 2 Business School, Shandong Normal University, Jinan, China Business School, University of Texas at San Antonio, San Antonio, USA Correspondence should be addressed to Xiyu Liu; [email protected] Received 29 March 2019; Revised 8 June 2019; Accepted 20 June 2019; Published 31 July 2019 Academic Editor: Paolo Spagnolo Copyright © 2019 Zhenni Jiang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is study proposes a novel method to calculate the density of the data points based on K-nearest neighbors and Shannon entropy. A variant of tissue-like P systems with active membranes is introduced to realize the clustering process. e new variant of tissue- like P systems can improve the efficiency of the algorithm and reduce the computation complexity. Finally, experimental results on synthetic and real-world datasets show that the new method is more effective than the other state-of-the-art clustering methods. 1. Introduction Clustering is an unsupervised learning method, which aims to divide a given population into several groups or classes, called clusters, in such a way that similar objects are put into the same group and dissimilar objects are put into different groups. Clustering methods generally include five categories: partitioning methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods [1]. Partitioning and hierarchical methods can find spherical- shaped clusters but do not perform well on arbitrary clus- ters. Density-based clustering [2] methods can be used to overcome this problem, which can model clusters as dense regions and boundaries as sparse regions. ree represen- tative approaches of the density-based clustering method are DBSCAN (Density-Based Spatial Clustering of Appli- cation with Noise), OPTICS (Ordering Points to Identify the Clustering Structure), and DENCLUE (DENsity-based CLUstEring). Usually, an objective function measuring the clustering quality is optimized by an iterative process in some clustering algorithms. However, this approach may cause low efficiency. us, the density peaks clustering (DPC) algorithm was proposed by Rodriguez and Laio [3] in 2014. is method can obtain the clusters in a single step regardless of the shape and dimensionality of the space. DPC is based on the idea that cluster centers are characterized by a higher density than the surrounding regions by a relatively large distance from points with higher densities. For the DPC algorithm, scholars have done a lot of research. However, DPC still has several challenges that need to be addressed. First, the local density of data points can be affected by the cut off distance, which can influence the clustering results. Second, the number of clusters needs to be decided by users, but the manual selection of the cluster centers can influence the clustering result. Cong et al. [4] proposed a clustering model for high dimensional data based on DPC that accomplishes clustering simply and directly for data with more than six- dimensions with arbitrary shapes. e problem in this model is that clustering effect is not ideal for different classes and big differences in order of magnitude. Xu et al. [5] introduced a novel approach, called density peaks clustering algorithm based on grid (DPCG). But this method also needs to rely on the user experiment in the choice of clustering centers. Bie et al. [6] proposed a fuzzy-CFSFDP method for adaptively but effectively selecting the cluster centers. Du et al. [7] proposed a new DPC algorithm using geodesic distances. And Du et al. [8] also proposed a FN-DP (fuzzy neighborhood density peaks) clustering algorithm. But they cannot select clustering center automatic and the FN-DP algorithm can cost much time in the calculation of the similarity matrix. Hou and Cui [9] introduced a density normalization step to make Hindawi Mathematical Problems in Engineering Volume 2019, Article ID 1713801, 13 pages https://doi.org/10.1155/2019/1713801

Upload: others

Post on 11-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

Research ArticleA Density Peak Clustering Algorithm Based onthe K-Nearest Shannon Entropy and Tissue-Like P System

Zhenni Jiang1 Xiyu Liu 1 andMinghe Sun 2

1Business School Shandong Normal University Jinan China2Business School University of Texas at San Antonio San Antonio USA

Correspondence should be addressed to Xiyu Liu xyliusdnueducn

Received 29 March 2019 Revised 8 June 2019 Accepted 20 June 2019 Published 31 July 2019

Academic Editor Paolo Spagnolo

Copyright copy 2019 Zhenni Jiang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

This study proposes a novel method to calculate the density of the data points based on K-nearest neighbors and Shannon entropyA variant of tissue-like P systems with active membranes is introduced to realize the clustering process The new variant of tissue-like P systems can improve the efficiency of the algorithm and reduce the computation complexity Finally experimental results onsynthetic and real-world datasets show that the new method is more effective than the other state-of-the-art clustering methods

1 Introduction

Clustering is an unsupervised learning method which aimsto divide a given population into several groups or classescalled clusters in such a way that similar objects are put intothe same group and dissimilar objects are put into differentgroups Clustering methods generally include five categoriespartitioning methods hierarchical methods density-basedmethods grid-basedmethods andmodel-basedmethods [1]Partitioning and hierarchical methods can find spherical-shaped clusters but do not perform well on arbitrary clus-ters Density-based clustering [2] methods can be used toovercome this problem which can model clusters as denseregions and boundaries as sparse regions Three represen-tative approaches of the density-based clustering methodare DBSCAN (Density-Based Spatial Clustering of Appli-cation with Noise) OPTICS (Ordering Points to Identifythe Clustering Structure) and DENCLUE (DENsity-basedCLUstEring)

Usually an objective function measuring the clusteringquality is optimized by an iterative process in some clusteringalgorithms However this approachmay cause low efficiencyThus the density peaks clustering (DPC) algorithm wasproposed by Rodriguez and Laio [3] in 2014 This methodcan obtain the clusters in a single step regardless of the shapeand dimensionality of the space DPC is based on the idea

that cluster centers are characterized by a higher densitythan the surrounding regions by a relatively large distancefrom points with higher densities For the DPC algorithmscholars have done a lot of research However DPC stillhas several challenges that need to be addressed First thelocal density of data points can be affected by the cut offdistance which can influence the clustering results Secondthe number of clusters needs to be decided by users butthe manual selection of the cluster centers can influence theclustering result Cong et al [4] proposed a clustering modelfor high dimensional data based on DPC that accomplishesclustering simply and directly for data with more than six-dimensions with arbitrary shapesThe problem in this modelis that clustering effect is not ideal for different classes andbig differences in order of magnitude Xu et al [5] introduceda novel approach called density peaks clustering algorithmbased on grid (DPCG) But this method also needs to rely onthe user experiment in the choice of clustering centers Bie etal [6] proposed a fuzzy-CFSFDP method for adaptively buteffectively selecting the cluster centers Du et al [7] proposeda new DPC algorithm using geodesic distances And Du etal [8] also proposed a FN-DP (fuzzy neighborhood densitypeaks) clustering algorithm But they cannot select clusteringcenter automatic and the FN-DP algorithm can cost muchtime in the calculation of the similarity matrix Hou andCui [9] introduced a density normalization step to make

HindawiMathematical Problems in EngineeringVolume 2019 Article ID 1713801 13 pageshttpsdoiorg10115520191713801

2 Mathematical Problems in Engineering

large-density clusters partitioned into multiple parts andsmall-density clusters merged with other clusters Xu etal proposed a FDPC algorithm based on a novel mergingstrategy motivated by support vector machines [10] But italso has the problem of higher complexity and needs toselect the clustering center by users Liu et al [11] proposed ashared-nearest-neighbor-based clustering by fast search andfind of density peaks (SNN-DPC) algorithm Based on priorassumptions of consistency for semisupervised learning algo-rithms some scholars also made assumptions of consistencyfor density-based clustering The first assumption is of localconsistency which means nearby points are likely to havesimilar local density and the second assumption is of globalconsistency which means points in the same high-densityarea (or the same structure ie the same cluster) are likelyto have the same label [12] This method also cannot findthe clustering centers automatically Although many studiesaboutDPChave been reported it still hasmany problems thatneed to be studied

Membrane computing proposed by Paun [13] as a newbranch of natural computing abstracts out computationalmodels from the structures and functions of biological cellsand from the collaboration between organs and tissuesMem-brane computing mainly includes three basic computationalmodels ie the cell-like P system the tissue-like P systemand the neural-like P system In the computation processeach cell is treated as an independent unit each unit operatesindependently and does not interfere with each other and theentiremembrane system operates inmaximally parallel Overthe past years many variants ofmembrane systems have beenproposed [14ndash18] including membrane algorithms of solvingglobal optimization problems In recent years applications ofmembrane computing have attracted a lot of attention fromresearchers [19ndash22] There are also some other applicationsfor example membrane systems are used to solve multiob-jective fuzzy clustering problems [23] solve unsupervisedlearning algorithms [24] solve automatic fuzzy clusteringproblems [25] and solve the problems of fault diagnosis ofpower systems [26] Liu et al [27] proposed an improvedApriori algorithm based on an Evolution-Communicationtissue-Like P System Liu and Xue [28] introduced a P systemon simplices Zhao et al [29] proposed a spiking neural Psystem with neuron division and dissolution

Based on previous works the main motivation of thiswork is using membrane systems to develop a frameworkfor a density peak clustering algorithm A new method ofcalculating the density of the data points is proposed basedon the K-nearest neighbors and Shannon entropy A variantof the tissue-like P system with active membranes is used torealize the clustering process The newmodel of the P systemcan improve efficiency and reduce computation complexityExperimental results show that this method is more effectiveand accurate than the state-of-the-art methods

The rest of this paper is organized as follows Section 2describes the basic DPC algorithm and the tissue-like Psystem Section 3 introduces the tissue-like P system withactive membranes for DPC based on the K-nearest neighborsand Shannon entropy and describes the clustering procedureSection 4 reports experimental results on synthetic datasets

andUCI datasets Conclusions are drawn and future researchdirections are outlined in Section 5

2 Preliminaries

21 e Original Density Peak Clustering AlgorithmRodriguez and Laio [3] proposed the DPC algorithm in2014 This algorithm is based on the idea that cluster centershave higher densities than the surrounding regions andthe distances among cluster centers are relatively large Ithas three important parameters The first one 120588119894 is the localdensity of data point 119894 the second one 120575119894 is the minimumdistance between data point 119894 and other data points withhigher density and the third one 120574119894 = 120588119894 times 120575119894 is the productof the other two The first two parameters correspond to twoassumptions of the DPC algorithm One assumption is thatcluster centers have higher density than the surroundingregions The other assumption is that this point has largerdistance from the points in other clusters than from pointsin the same cluster In the following the computations of 120588119894and 120575119894 are discussed in detail

Let 119883 = x1 x2 x119899 be a dataset with 119899 data pointsEach x119894 has119872 attributes Therefore 119909119894119895 is the jth attribute ofdata point x119894 The Euclidean distance between the data pointsx119894 and x119895 can be expressed as follows119889119894119895 = 119889 (x119894 x119895) = 10038171003817100381710038171003817x119894 minus x11989510038171003817100381710038171003817 (1)

The local density 120588119894 of the data point x119894 is defined as120588119894 = sum119895 =119894

120594 (119889 (x119894 minus x119895) minus 119889119888) (2)

with 120594 (x) = 1 x lt 00 x gt 0 (3)

where 119889119888 is the cutoff distance In fact 120588119894 is the number ofdata points adjacent to data point x119894 The minimal distance120575119894 between data point x119894 and any other data points x1198941015840 with ahigher density 1205881198941015840 is given by

120575119894 = min1198951205881198941015840gt120588119894(119889119894119895) 119894119891 exist119894119894 119904119905 1205881198941015840 gt 120588119894

max1198941015840(1198891198941198941015840) otherwise

(4)

After 120588119894 and 120575119894 are calculated for each data point x119894 adecision graph with 120575119894 on the vertical axis and 120588119894 on thehorizontal axis can be plotted This graph can be used to findthe cluster centers and then to assign each remaining datapoint to the cluster with the shorted distance

The computation of the local densities of the data pointsis a key factor for the effectiveness and efficiency of the DPCThere aremany other ways to calculate the local densities Forexample the local density of x119894 can be computed using (5) inthe following [3] 120588119894 = sum

119895

exp(minus 11988921198941198951198891198882) (5)

Mathematical Problems in Engineering 3

The way in (5) is suitable for ldquosmallrdquo datasets In fact it isdifficult to judge if the dataset is small or large When (5) isused to calculate the local density the results can be greatlyaffected by the cutoff distance 119889119888

Each component on the right side of (5) is a Gaussianfunction Figure 1 visualizes the function exp(minus119905) and twoGaussian functions exp(minus11990521205902) with different values of 120590The blue and red curves are the curves of exp(minus11990521205902) with120590 = 1 and 120590 = radic2 respectively The curve with a smallervalue of 120590 declines more quickly than the curve with a largervalue of 120590 Comparing the curve of exp(minus119905) the yellow dashdotted curve with the curves of exp(minus11990521205902) it can be foundthat values of exp(minus11990521205902) are greater than those of exp(minus119905)when 119905 lt 1205902 but decay faster when 119905 gt 1205902 This means that ifthe value of the parameter 120590 needs to be decided manually inthe density calculation the result of the calculated densitieswill be influenced by the selected value This analysis showsthat the parameter 120590 has a big effect on the calculated resultsFurthermore the density in (5) can be influenced by thecutoff distance 119889119888 To eliminate the influence from the cutoffdistance 119889119888 and give a uniform metric for datasets with anysize Du et al [30] proposed the K-nearest neighbor methodThe local density in Du et al [30] is given by

120588119894 = exp(minus 1119870 sum119895isin119870119873119873119894

1198892119894119895) (6)

where 119870 is an input parameter and 119870119873119873119894 is the set of the 119870nearest neighbors of data point x119894 However this method didnot consider the influence of the position of the data point onits own densityTherefore this current study proposes a novelmethod to calculate the densities

22 e Tissue-Like P System with Active Membrane Atissue-like P system has a graphical structure The nodesof the graph correspond to the cells and the environmentin the tissue-like P system whereas the edges of the graphrepresent the channels for communication between the cellsThe tissue-like P system is slightly more complicated than thecell-like P system Each cell has a different state Only thestate that meets the requirement specified by the rules can bechangedThebasic framework of the tissue-like P systemusedin this study is shown in Figure 2

A P system with active membranes is a constructprod = (119874119885119867 1205961 120596119898 119864 119888ℎ (119904(119894119895))(119894119895)isin119888ℎ (119877(119894119895))(119894119895)isin119888ℎ 1198940) (7)

where

(1) 119874 is the set of alphabets of all objects which appear inthe system

(2) 119885 represents the states of the alphabets(3) 119867 is the set of labels of the membranes(4) 1205961 120596119898 are the initial multiple objects in cells 1 to

m

(5) 119864 sube 119874 is the set of objects present in an arbitrarynumber of copies in the environment

(6) 119888ℎ sube (119894 119895) | 119894 119895 isin 0 1 119898 119894 = 119895 is the setof channels between cells and between cells and theenvironment

(7) 119904(119894119895) is the initial state of the channel (i j)(8) 119877(119894119895) is a finite set of symportantiport rules of the

form (119904 119909119910 1199041015840) with 119904 1199041015840 isin 119885 and 119909 y isin 119874(i) [a 997888rarr b]ℎ where ℎ isin 119867 119886 isin 119874 and 119887 isin 119874

(Object evolution rules an object is evolved intoanother in a membrane)

(ii) a[]ℎ 997888rarr [b]ℎ where ℎ isin 119867 and 119886 119887 isin 119874(Send-in communication rules an object isintroduced into a membrane and may be modi-fied during the process)

(iii) [a]ℎ 997888rarr []ℎ119887 where ℎ isin 119867 119886 119887 isin 119874(Send-out communication rules an object issent out of the membrane and may be modifiedduring the process)

(iv) [a]ℎ 997888rarr [b]ℎ1[c]ℎ2 where ℎ ℎ1 ℎ2 isin 119867 and119886 119887 isin 119874 (Division rules for elementary mem-branes themembrane is divided into twomem-branes with possibly different labels the objectspecified in the rule is replaced by possibly newobjects in the two new membranes and theremaining objects are duplicated in the process)

(9) 1198940 isin 1 119898 is the output cellThe biggest difference between a cell-like P system and a

tissue-like P system is that each cell can communicate withthe environment in the tissue-like P system but only the skinmembrane can communicate with the environment in thecell-like P system This does not mean that any two cells inthe tissue-like P system can communicate with each otherIf there is no direct communication channel between thetwo cells they can communicate through the environmentindirectly

3 The Proposed Method

31 Density Metric Based on the K-Nearest Neighbors andShannon Entropy DPC still has some defects The currentDPC algorithm has the obvious shortcoming that it needs toset the value of the cutoff distance 119889119888 manually in advanceThis value will largely affect the final clustering resultsIn order to overcome this shortcoming a new method isproposed to calculate the density metric based on the K-nearest neighbors and Shannon entropy

K-nearest neighbors (KNN) is usually used to measurea local neighborhood of an instance in the fields of clas-sification clustering local outlier detection etc The aimof this approach is to find the K-nearest neighbors of asample among N samples In general the distances betweenpoints are achieved by calculating the Euclidean distance Let

4 Mathematical Problems in Engineering

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07

08

09

1

exp(-N22) =1exp(-N22) =sqrt(2)exp(-t)

Figure 1 Three different function curves

1 2 n

n+1

Figure 2 Membrane structure of a tissue-like P system

KNN(119894) be a set of nearest neighbors of a point 119894 and it can beexpressed as 119870119873119873(119894) = 119895 | 119889119894119895 le 119889119894119896119905ℎ(i) (8)

where 119889(x119894 x119895) is the Euclidean distance between x119894 and x119895and 119896119905ℎ(119894) is the k-th nearest neighbor of 119894 Local regionsmeasured byKNNare often termedK-nearest neighborhoodwhich in fact is a circular or spherical area or radius119877 = 119889119894119896119905ℎ(119894) Therefore KNN-based method cannot applyto handle datasets with clusters nonspherical distributionsTherefore these methods usually have poor clustering resultswhen handling datasets with clusters of different shapes

Shannon entropymeasures the degree ofmolecular activ-ity The more unstable the system is the larger the value ofthe Shannon entropy is and vice versaThe Shannon entropyrepresented by119867(119883) is given by119867(119883) = minus 119873sum

119894=0

119901119894 log2 (119901119894) (9)

where119883 is the set of objects and 119901119894 is the probability of object119894 appearing in119883 When119867(119883) is used tomeasure the distancebetween the clusters the smaller the value of 119867(119883) is thebetter the clustering result isTherefore the Shannon entropy

is introduced to calculate the data point density in the K-nearest neighbormethod so that the final density calculationnot only considers the distance metric but also adds theinfluence of the data point position to the density of the datapoint

However the decision graph is calculated by the productof 120588119894 and 120575119894 A larger value of 120588119894 makes it easier to choosethe best clustering centers Therefore the reciprocal form ofthe Shannon entropy is adopted The metrics for 120588119894 and 120575119894may be inconsistent which directly leads to 120588119894 and 120575119894 playingdifferent roles in the calculation of the decision graph Henceit is necessary to normalize 120588119894 and 120575119894

The specific calculation method is as follows First thelocal density of data point x119894 is calculated1199081015840119894 = 119899sum

119895=1119895 =119894

110038171003817100381710038171003817x119895 minus x11989410038171003817100381710038171003817 (10)

where x119894 and x119895 are data points and 1199081015840119894 is the density of datapoint x119894 Next the density of data point x119894 is normalized andthe normalized density is denoted as 119908119894119908119894 = 1199081015840119894sum119899119894=1 1199081015840119894 (11)

Mathematical Problems in Engineering 5

Finally the density metric which uses the idea of the K-nearest neighbor method is defined as120588119894 = minus 1(1119870)sum119894isin119870119873119873119908119894 log (119908119894) (12)

To guarantee the consistence of the metrics of 120588119894 and 120575119894120575119894 also needs to be normalized

32 Tissue-Like P System with Active Membranes for ImprovedDensity Peak Clustering In the following a tissue-like Psystem with active membranes for density peak clusteringcalled KST-DPC is proposed As mentioned before assumethe dataset with 119899 data points is represented by 119883 =x1 x2 x119899 Before performing any specific calculation ofthe DPC algorithm the Euclidean distance between each pairof data points in the dataset is calculated and the result isstored in the form of a matrix The initial configuration ofthis P system is shown in Figure 3

When the system is initialized the objects x119894 b1b2 b119899 are in membrane 119894 for 1 le 119894 le 119899 and object 120582 isin membrane 119899 + 1 where 120582 means there is no object Firstthe Euclidean distance 119908119894119895 between the data points x119894 and x119895(represented by b119895 for 1 le 119895 le 119899) is calculated with the rule1199031 = [x119894b1b2 sdot sdot sdot b119899]119894 997888rarr [11988911990811989411198941 11988911990811989421198942 sdot sdot sdot 119889119908119894n119894n ]119894 | 1 le 119894 le 119899Note that x119895 for 1 le 119895 le 119899 are expressed as b1b2 sdot sdot sdot b119899The results are stored as the distance matrix also called thedissimilarity matrix119863119899119899

119863119899119899 =(11990811 11990812 sdot sdot sdot 119908111989911990821 11990822 sdot sdot sdot 1199082119899sdot sdot sdot1199081198991 1199081198992 sdot sdot sdot 119908119899119899) (13)

At the beginning there are 119899 + 1 membranes inthe P system After the distances are calculated objects119909119894 1198891199081198941i1 1198891199081198942i2 sdot sdot sdot 119889119908i119899in are placed in membrane 119894 for 1 le 119894 le 119899In the next step the densities of the data points are calculatedby the rule 1199032 = [11988911990811989411198941 11988911990811989421198942 sdot sdot sdot 119889119908119894n119894n ]119894 997888rarr [1199081015840119894 ]119894 | 1 le 119894 le 119899Then the send-in and send-out communication rules are usedto calculate the values of 120588119894 120575119894 and 120574119894 and to put them inmembrane 119894 for 1 le 119894 le 119899 Next according to the sortedresults of 120574119894 for 1 le 119894 le 119899 the number of clusters 119896 can bedeterminedThe rule of the active membranes is used to splitmembrane 119899+1 into 119896membranes as shown in Figure 4The 119896cluster centers are put inmembranes 119899+1 to 119899+119896 respectivelyFinally the remaining data points are divided and each is putinto a membrane with a cluster center that is closest to thedata point Up to this point the clusters are obtained

The main steps of KST-DPC is summarized asin Algorithm 1

33 Time Complexity Analysis of KST-DPC As usual com-putations in the cells in the tissue-like P system can be imple-mented in parallel Because of the parallel implementationthe generation of the dissimilarity matrix uses 119899 computationsteps The generation of the data points densities needs 1

Table 1 Synthetic datasets

Dataset Instances Dimensions ClustersSpiral 312 2 3Compound 7266 2 6Jain 373 2 2Aggregation 788 2 7R15 600 2 15D31 3100 2 31

computation step The calculation of the final density 120588119894 uses119896 computation steps The calculation of 120575119894 needs 119899 steps Thecalculation of 120574119894 uses 1 step 119899 log 119899 steps are used to sort 120574119894for 1 le 119894 le 119899 Finally the final clustering needs 1 morecomputation step Therefore the total time complexity ofKST-DPC is 119899 + 1 + 119896 + 119899 + 1 + 119899 log 119899 + 1 = 119874(119899 log 119899)The time complexity of the DPC-KNN is119874(1198992) As comparedto DPC-KNN KST-DPC reduces the time complexity bytransferring time complexity to space complexity The aboveanalysis demonstrates that the overall time complexity ofKST-DPC is superior to that of DPC-KNN

4 Test and Analysis

41 Data Sources Experiments on six synthetic datasetsand four real-world datasets are carried out to test theperformance of KST-DPC The synthetic datasets are fromhttpcsueffisipudatasets These datasets are commonlyused as benchmarks to test the performance of clusteringalgorithms The real-world datasets used in the experimentsare from the UCI Machine Learning Repository [31] Thesedatasets are chosen to test the ability of KST-DPC in identi-fying clusters having arbitrary shapes without being affectedby noise size or dimensions of the datasets The numbersof features (dimensions) data points (instances) and clustersvary in each of the datasets The details of the synthetic andreal-world datasets are listed in Tables 1 and 2 respectively

The performance of KST-DPC was compared with thoseof the well-known clustering algorithms SC [32] DBSCAN[33] andDPC-KNN [28 34]The codes for SC andDBSCANare provided by their authors The code of DPC is optimizedby using the matrix operation instead of iteration cycle basedon the original code provided by Rodriguez and Laio [3] toreduce running time

The performances of the above clustering algorithmsare measured in clustering quality or Accuracy (Acc) andNormalized Mutual Information (NMI) They are very pop-ular measures for testing the performance of clusteringalgorithmsThe larger the values are the better the results areThe upper bound of these measures is 1

42 Experimental Results on the Synthetic Datasets In thissubsection the performances of KST-DPC DPC-KNNDBSCAN and SC are reported on the six synthetic datasetsThe clustering results by the four clustering algorithms for thesix synthetic datasets are color coded and displayed in two-dimensional spaces as shown in Figures 5ndash10 The results of

6 Mathematical Problems in Engineering

1 2 n

n+1

1 2 n

n+1

x1 b1 b2 middot middot middot bn x2 b1 b2 middot middot middot bn xn b1 b2 middot middot middot bn

Figure 3 The initial configuration of the tissue-like P system

Inputs dataset X parameter KOutput ClustersStep 1 The objects x119894 b1 b2 b119899 are in membrane 119894 for 1 le 119894 le 119899

and object 120582 is in membrane 119899 + 1Step 2 Compute the Euclidean distance matrix 119908119894119895 by the rule1Step 3 Compute the local densities of the data points by the rule2 and

normalize them using (10) and (11)Step 4 Calculate 120588119894 and 120575119894 for data point 119894 using (12) and (4) in every

membrane 119894 respectivelyStep 5 Calculate 120574119894 = 120588119894 times 120575119894 for all 1 le 119894 le 119899 in membrane 119894 and sort them

by descend and select the top K values as the initial cluster center So as todetermine the centers of the clusters

Step 6 Split the membrane 119899 + 1 to K membranes by the division rules which membranes canbe number from 119899 + 1 to 119899 + 119896

Step 7 The119870 clustering centers are put in membranes 119899 + 1 to 119899 + 119896 respectivelyStep 8 Assign each remaining point to the membrane with the nearest cluster centerStep 9 Return the clustering result

Algorithm 1

Table 2 Real-world datasets

Dataset Instances Dimensions ClustersVertebral 310 7 2Seeds 210 7 3Breast cancer 699 10 2Banknotes 1372 5 2

the four clustering algorithms on a dataset are shown as fourparts in a single figure The cluster centers of the KST-DPCand DPC-KNN algorithms are marked in the figures withdifferent colors For DBSCAN it is not meaningful to markthe cluster centers because they are chosen randomly Eachclustering algorithm ran multiple times on each dataset andthe best result of each clustering algorithm is displayed

The performance measures of the four clustering algo-rithms on the six synthetic datasets are reported in Table 3 InTable 3 the column ldquoParrdquo for each algorithm is the number ofparameters the users need to set KST-DPC and DPC-KNNhave only one parameter K which is the number of nearestneighbors to be prespecified In this paper the value of K isdetermined by the percentage of the data points It referencesthemethod in [34] For each dataset we adjust the percentage

of data points in the KNN for the multiple times and findthe optimal percentage that can make the final clusteringreach the best Because we perform more experiments weonly list the best result in Tables 3 and 4 And in order tobe consistent wi other parameters in the table we directlyconvert the percentage of data points into specific K valuesDBSCANhas two input parameters themaximum radiusEpsand the minimum pointMinPts The SC algorithm needs thetrue number of clusters C1 in Table 3 refers to the numberof cluster centers found by the algorithms The performancemeasures including Acc and NMI are presented in Table 3 forthe four clustering algorithms on the six synthetic datasets

The Spiral dataset has 3 clusters with 312 data pointsembracing each other Table 3 and Figure 5 show that KST-DPC DPC-KNN DBSCAN and SC can all find the correctnumber of clusters and get the correct clustering results Allthe benchmark values are 100 reflecting the four algorithmsall performing perfectly well on the Spiral dataset

The Compound dataset has 6 clusters with 399 datapoints From Table 3 and Figure 6 it is obvious that KST-DPC can find the ideal clustering result DBSCAN cannotfind the right clusters whereas DPC-KNN and SC cannot findthe clustering centers Because DPC has a special assignmentstrategy [3] it may assign data points erroneously to clusters

Mathematical Problems in Engineering 7

Table 3 Results on the synthetic datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMISpiral CompoundKST-DPC 16 3 100 100 KST-DPC 217 6 098 095DPC-KNN 20 3 100 100 DPC-KNN 360 6 06466 07663DBSCAN 123 3 100 100 DBSCAN 153 5 08596 09429SC 3 3 100 100 SC 6 6 06015 07622Jain AggregationKST-DPC 4 2 100 100 KST-DPC 40 7 100 100DPC-KNN 8 2 09035 05972 DPC-KNN 40 7 09987 09957DBSCAN 2624 2 100 100 DBSCAN 1593 5 08274 08894SC 2 2 100 100 SC 7 7 09937 09824R15 D31KST-DPC 20 15 100 099 KST-DPC 25 31 10000 10000DPC-KNN 20 15 100 099 DPC-KNN 25 31 09700 09500DBSCAN 045 13 078 09155 DBSCAN 0463 27 06516 08444SC 15 15 09967 09942 SC 31 31 09765 09670

x1 dw1111 d

w1212 middot middot middot d

w1

1H

1 1 1

x2 dw2121 d

w2222 middot middot middot d

w2

2H

2 2 2

xn dw1H1 d

w2H2 middot middot middot d

wHHH

n n n1 2 n

n+1 n+2 n+k

Figure 4 The tissue-like membrane system in the calculation process

once a data point with a higher density is assigned to anincorrect cluster For this reason some data points belongingto cluster 1 are incorrectly assigned to cluster 2 or 3 asshown in Figures 6(b)ndash6(d) DBSCAN has some prespecifiedparameters that can have heavy effects on the clusteringresults As shown in Figure 6(c) two clusters are mergedinto one cluster in two occasions KST-DPC obtained Accand NMI values higher than those obtained by the otheralgorithms

The Jain dataset has two clusters with 373 data points ina 2 dimensional space The clustering results show that KST-DPC DBSCAN and SC can get correct results and both ofthe benchmark values are 100 The experimental results ofthe 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 7 DPC-KNN devides somepoints that should belong to the bottom cluster into the uppercluster Although all the four clustering algorithms can find

the correct number of clusters KST-DPC DBSCAN and SCaremore effective because they can put all the data points intothe correct clusters

The Aggregation dataset has 7 clusters with differentsizes and shapes and two pairs of clusters connected to eachother Figure 8 shows that both the KST-DPC and DPC-KNN algorithms can effectively find the cluster centers andcorrect clusters except that an individual data point is putinto an incorrect cluster byDPC-KNN Table 3 shows that thebenchmark values of KST-DPC are all 100 and those of DPC-KNN are close to 100 SC also can recognize all clusters butthe values ofAcc andNMI are lower than those ofDPC-KNNDBSCAN did not find all clusters and could not partition theclusters connected to each other

TheR15 dataset has 15 clusters containing 600 data pointsThe clusters are slightly overlapping and are distributedrandomly in a 2-dimensional space One cluster lays in the

8 Mathematical Problems in Engineering

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(a) KST-DPC0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(b) DPC-KNN

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(c) DBSCAN0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(d) SC

Figure 5 Clustering results of the Spiral dataset by the four clustering algorithms

center of the 2-dimensional space and is closely surroundedby seven other clusters The experimental results of the 4algorithms are shown in Table 3 and the clustering resultsare displayed in Figure 9 KST-DPC and DPC-KNN can bothfind the correct cluster centers and assign almost all datapoints to their corresponding clusters SC also obtained goodexperimental result but DBSCAN did not find all clusters

The D31 dataset has 31 clusters and contains 3100 datapoints These clusters are slightly overlapping and distributerandomly in a 2-dimensional spaceThe experimental resultsof the 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 10 The values of Acc and NMIobtained by KST-DPC are all 100 This shows that KST-DPCobtained perfect clustering results on the D31 dataset DPCand SC obtained similar results to those of KST-DPC on thisdataset but DBSCAN was not able to find all clusters

43 Experimental Results on the Real-World Datasets Thissubsection reports the performances of the clustering algo-rithms on the four real-world datasets The varying sizesand dimensions of these datasets are useful in testing theperformance of the algorithms under different conditions

The number of clusters Acc and NMI are also usedto measure the performances of the clustering algorithmson these real-world datasets The experimental results are

reported in Table 4 and the best results of the each dataset areshown in italicThe symbol ldquo--rdquo indicates there is no value forthat entry

The Vertebral dataset consists of 2 clusters and 310 datapoints As Table 4 shows the value of Acc got by KST-DPCis equal to that got by DPC-KNN but the value of NMI gotby KST-DPC is lower than that got by DPC-KNN No valuesof Acc and NMI were obtained by SC As Table 4 shows allalgorithms could find the right number of clusters

The Seeds dataset consists of 210 data points and 3clusters Results in Table 4 show that KST-DPC obtained thebest whereas DBSCAN obtained the worst values of Acc andNMI It is obvious that all four clustering algorithms could getthe right number of clusters

The Breast Cancer dataset consists of 699 data points and2 clusters The results on this dataset in Table 4 show thatall four clustering algorithms could find the right numberof clusters KST-DPC obtained the Acc and NMI values of08624 and 04106 respectively which are higher than thoseobtained by other clustering algorithmsThe results also showthat DBSCAN has the worst performance on this datasetexcept that SC did not get experimental results on thesebenchmarks

The Banknotes dataset consists of 1372 data points and 2clusters FromTable 4 it is obvious that KST-DPCgot the best

Mathematical Problems in Engineering 9

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(a) KST-DPC5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(b) DPC-KNN

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(c) DBSCAN5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(d) SC

Figure 6 Clustering results of the Compound dataset by the four clustering algorithms

Table 4 Results on the real-world datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMIVertebral SeedsKST-DPC 9 2 06806 00313 KST-DPC 4 3 08429 06574DPC-KNN 9 2 06806 00821 DPC-KNN 6 3 08143 06252DBSCAN 748 2 06742 -- DBSCAN 0927 3 05857 04835SC 2 2 -- -- SC 3 3 06071 05987Breast cancer BanknotesKST-DPC 70 2 08624 04106 KST-DPC 68 2 08434 07236DPC-KNN 76 2 07954 03154 DPC-KNN 82 2 07340 03311DBSCAN 620 2 06552 00872 DBSCAN 655 2 05554 67210e-16SC 2 2 -- -- SC 2 2 06152 00598

values of Acc and NMI among all four clustering algorithmsThe values of Acc and NMI obtained by KST-DPC are 08434and 07260 respectively Larger values of these benchmarksindicate that the experimental results obtained by KST-DPCare closer to the true results than those obtained by the otherclustering algorithms

All these experimental results show that KST-DPC out-perform the other clustering algorithms It obtained largervalues of Acc and NMI than the other clustering algorithms

5 Conclusion

This study proposed a density peak clustering algorithmbased on the K-nearest neighbors Shannon entropy andtissue-like P systems It uses the K-nearest neighbors andShannon entropy to calculate the density metric This algo-rithm overcomes the shortcoming that DPC has that is to setthe value of the cutoff distance 119889119888 in advance The tissue-likeP system is used to realize the clustering processThe analysisdemonstrates that the overall time taken by KST-DPC is

10 Mathematical Problems in Engineering

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(d) SC

Figure 7 Clustering results of the Jain dataset by the four clustering algorithms

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(d) SC

Figure 8 Clustering results of the Aggregation dataset by the four clustering algorithms

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 2: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

2 Mathematical Problems in Engineering

large-density clusters partitioned into multiple parts andsmall-density clusters merged with other clusters Xu etal proposed a FDPC algorithm based on a novel mergingstrategy motivated by support vector machines [10] But italso has the problem of higher complexity and needs toselect the clustering center by users Liu et al [11] proposed ashared-nearest-neighbor-based clustering by fast search andfind of density peaks (SNN-DPC) algorithm Based on priorassumptions of consistency for semisupervised learning algo-rithms some scholars also made assumptions of consistencyfor density-based clustering The first assumption is of localconsistency which means nearby points are likely to havesimilar local density and the second assumption is of globalconsistency which means points in the same high-densityarea (or the same structure ie the same cluster) are likelyto have the same label [12] This method also cannot findthe clustering centers automatically Although many studiesaboutDPChave been reported it still hasmany problems thatneed to be studied

Membrane computing proposed by Paun [13] as a newbranch of natural computing abstracts out computationalmodels from the structures and functions of biological cellsand from the collaboration between organs and tissuesMem-brane computing mainly includes three basic computationalmodels ie the cell-like P system the tissue-like P systemand the neural-like P system In the computation processeach cell is treated as an independent unit each unit operatesindependently and does not interfere with each other and theentiremembrane system operates inmaximally parallel Overthe past years many variants ofmembrane systems have beenproposed [14ndash18] including membrane algorithms of solvingglobal optimization problems In recent years applications ofmembrane computing have attracted a lot of attention fromresearchers [19ndash22] There are also some other applicationsfor example membrane systems are used to solve multiob-jective fuzzy clustering problems [23] solve unsupervisedlearning algorithms [24] solve automatic fuzzy clusteringproblems [25] and solve the problems of fault diagnosis ofpower systems [26] Liu et al [27] proposed an improvedApriori algorithm based on an Evolution-Communicationtissue-Like P System Liu and Xue [28] introduced a P systemon simplices Zhao et al [29] proposed a spiking neural Psystem with neuron division and dissolution

Based on previous works the main motivation of thiswork is using membrane systems to develop a frameworkfor a density peak clustering algorithm A new method ofcalculating the density of the data points is proposed basedon the K-nearest neighbors and Shannon entropy A variantof the tissue-like P system with active membranes is used torealize the clustering process The newmodel of the P systemcan improve efficiency and reduce computation complexityExperimental results show that this method is more effectiveand accurate than the state-of-the-art methods

The rest of this paper is organized as follows Section 2describes the basic DPC algorithm and the tissue-like Psystem Section 3 introduces the tissue-like P system withactive membranes for DPC based on the K-nearest neighborsand Shannon entropy and describes the clustering procedureSection 4 reports experimental results on synthetic datasets

andUCI datasets Conclusions are drawn and future researchdirections are outlined in Section 5

2 Preliminaries

21 e Original Density Peak Clustering AlgorithmRodriguez and Laio [3] proposed the DPC algorithm in2014 This algorithm is based on the idea that cluster centershave higher densities than the surrounding regions andthe distances among cluster centers are relatively large Ithas three important parameters The first one 120588119894 is the localdensity of data point 119894 the second one 120575119894 is the minimumdistance between data point 119894 and other data points withhigher density and the third one 120574119894 = 120588119894 times 120575119894 is the productof the other two The first two parameters correspond to twoassumptions of the DPC algorithm One assumption is thatcluster centers have higher density than the surroundingregions The other assumption is that this point has largerdistance from the points in other clusters than from pointsin the same cluster In the following the computations of 120588119894and 120575119894 are discussed in detail

Let 119883 = x1 x2 x119899 be a dataset with 119899 data pointsEach x119894 has119872 attributes Therefore 119909119894119895 is the jth attribute ofdata point x119894 The Euclidean distance between the data pointsx119894 and x119895 can be expressed as follows119889119894119895 = 119889 (x119894 x119895) = 10038171003817100381710038171003817x119894 minus x11989510038171003817100381710038171003817 (1)

The local density 120588119894 of the data point x119894 is defined as120588119894 = sum119895 =119894

120594 (119889 (x119894 minus x119895) minus 119889119888) (2)

with 120594 (x) = 1 x lt 00 x gt 0 (3)

where 119889119888 is the cutoff distance In fact 120588119894 is the number ofdata points adjacent to data point x119894 The minimal distance120575119894 between data point x119894 and any other data points x1198941015840 with ahigher density 1205881198941015840 is given by

120575119894 = min1198951205881198941015840gt120588119894(119889119894119895) 119894119891 exist119894119894 119904119905 1205881198941015840 gt 120588119894

max1198941015840(1198891198941198941015840) otherwise

(4)

After 120588119894 and 120575119894 are calculated for each data point x119894 adecision graph with 120575119894 on the vertical axis and 120588119894 on thehorizontal axis can be plotted This graph can be used to findthe cluster centers and then to assign each remaining datapoint to the cluster with the shorted distance

The computation of the local densities of the data pointsis a key factor for the effectiveness and efficiency of the DPCThere aremany other ways to calculate the local densities Forexample the local density of x119894 can be computed using (5) inthe following [3] 120588119894 = sum

119895

exp(minus 11988921198941198951198891198882) (5)

Mathematical Problems in Engineering 3

The way in (5) is suitable for ldquosmallrdquo datasets In fact it isdifficult to judge if the dataset is small or large When (5) isused to calculate the local density the results can be greatlyaffected by the cutoff distance 119889119888

Each component on the right side of (5) is a Gaussianfunction Figure 1 visualizes the function exp(minus119905) and twoGaussian functions exp(minus11990521205902) with different values of 120590The blue and red curves are the curves of exp(minus11990521205902) with120590 = 1 and 120590 = radic2 respectively The curve with a smallervalue of 120590 declines more quickly than the curve with a largervalue of 120590 Comparing the curve of exp(minus119905) the yellow dashdotted curve with the curves of exp(minus11990521205902) it can be foundthat values of exp(minus11990521205902) are greater than those of exp(minus119905)when 119905 lt 1205902 but decay faster when 119905 gt 1205902 This means that ifthe value of the parameter 120590 needs to be decided manually inthe density calculation the result of the calculated densitieswill be influenced by the selected value This analysis showsthat the parameter 120590 has a big effect on the calculated resultsFurthermore the density in (5) can be influenced by thecutoff distance 119889119888 To eliminate the influence from the cutoffdistance 119889119888 and give a uniform metric for datasets with anysize Du et al [30] proposed the K-nearest neighbor methodThe local density in Du et al [30] is given by

120588119894 = exp(minus 1119870 sum119895isin119870119873119873119894

1198892119894119895) (6)

where 119870 is an input parameter and 119870119873119873119894 is the set of the 119870nearest neighbors of data point x119894 However this method didnot consider the influence of the position of the data point onits own densityTherefore this current study proposes a novelmethod to calculate the densities

22 e Tissue-Like P System with Active Membrane Atissue-like P system has a graphical structure The nodesof the graph correspond to the cells and the environmentin the tissue-like P system whereas the edges of the graphrepresent the channels for communication between the cellsThe tissue-like P system is slightly more complicated than thecell-like P system Each cell has a different state Only thestate that meets the requirement specified by the rules can bechangedThebasic framework of the tissue-like P systemusedin this study is shown in Figure 2

A P system with active membranes is a constructprod = (119874119885119867 1205961 120596119898 119864 119888ℎ (119904(119894119895))(119894119895)isin119888ℎ (119877(119894119895))(119894119895)isin119888ℎ 1198940) (7)

where

(1) 119874 is the set of alphabets of all objects which appear inthe system

(2) 119885 represents the states of the alphabets(3) 119867 is the set of labels of the membranes(4) 1205961 120596119898 are the initial multiple objects in cells 1 to

m

(5) 119864 sube 119874 is the set of objects present in an arbitrarynumber of copies in the environment

(6) 119888ℎ sube (119894 119895) | 119894 119895 isin 0 1 119898 119894 = 119895 is the setof channels between cells and between cells and theenvironment

(7) 119904(119894119895) is the initial state of the channel (i j)(8) 119877(119894119895) is a finite set of symportantiport rules of the

form (119904 119909119910 1199041015840) with 119904 1199041015840 isin 119885 and 119909 y isin 119874(i) [a 997888rarr b]ℎ where ℎ isin 119867 119886 isin 119874 and 119887 isin 119874

(Object evolution rules an object is evolved intoanother in a membrane)

(ii) a[]ℎ 997888rarr [b]ℎ where ℎ isin 119867 and 119886 119887 isin 119874(Send-in communication rules an object isintroduced into a membrane and may be modi-fied during the process)

(iii) [a]ℎ 997888rarr []ℎ119887 where ℎ isin 119867 119886 119887 isin 119874(Send-out communication rules an object issent out of the membrane and may be modifiedduring the process)

(iv) [a]ℎ 997888rarr [b]ℎ1[c]ℎ2 where ℎ ℎ1 ℎ2 isin 119867 and119886 119887 isin 119874 (Division rules for elementary mem-branes themembrane is divided into twomem-branes with possibly different labels the objectspecified in the rule is replaced by possibly newobjects in the two new membranes and theremaining objects are duplicated in the process)

(9) 1198940 isin 1 119898 is the output cellThe biggest difference between a cell-like P system and a

tissue-like P system is that each cell can communicate withthe environment in the tissue-like P system but only the skinmembrane can communicate with the environment in thecell-like P system This does not mean that any two cells inthe tissue-like P system can communicate with each otherIf there is no direct communication channel between thetwo cells they can communicate through the environmentindirectly

3 The Proposed Method

31 Density Metric Based on the K-Nearest Neighbors andShannon Entropy DPC still has some defects The currentDPC algorithm has the obvious shortcoming that it needs toset the value of the cutoff distance 119889119888 manually in advanceThis value will largely affect the final clustering resultsIn order to overcome this shortcoming a new method isproposed to calculate the density metric based on the K-nearest neighbors and Shannon entropy

K-nearest neighbors (KNN) is usually used to measurea local neighborhood of an instance in the fields of clas-sification clustering local outlier detection etc The aimof this approach is to find the K-nearest neighbors of asample among N samples In general the distances betweenpoints are achieved by calculating the Euclidean distance Let

4 Mathematical Problems in Engineering

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07

08

09

1

exp(-N22) =1exp(-N22) =sqrt(2)exp(-t)

Figure 1 Three different function curves

1 2 n

n+1

Figure 2 Membrane structure of a tissue-like P system

KNN(119894) be a set of nearest neighbors of a point 119894 and it can beexpressed as 119870119873119873(119894) = 119895 | 119889119894119895 le 119889119894119896119905ℎ(i) (8)

where 119889(x119894 x119895) is the Euclidean distance between x119894 and x119895and 119896119905ℎ(119894) is the k-th nearest neighbor of 119894 Local regionsmeasured byKNNare often termedK-nearest neighborhoodwhich in fact is a circular or spherical area or radius119877 = 119889119894119896119905ℎ(119894) Therefore KNN-based method cannot applyto handle datasets with clusters nonspherical distributionsTherefore these methods usually have poor clustering resultswhen handling datasets with clusters of different shapes

Shannon entropymeasures the degree ofmolecular activ-ity The more unstable the system is the larger the value ofthe Shannon entropy is and vice versaThe Shannon entropyrepresented by119867(119883) is given by119867(119883) = minus 119873sum

119894=0

119901119894 log2 (119901119894) (9)

where119883 is the set of objects and 119901119894 is the probability of object119894 appearing in119883 When119867(119883) is used tomeasure the distancebetween the clusters the smaller the value of 119867(119883) is thebetter the clustering result isTherefore the Shannon entropy

is introduced to calculate the data point density in the K-nearest neighbormethod so that the final density calculationnot only considers the distance metric but also adds theinfluence of the data point position to the density of the datapoint

However the decision graph is calculated by the productof 120588119894 and 120575119894 A larger value of 120588119894 makes it easier to choosethe best clustering centers Therefore the reciprocal form ofthe Shannon entropy is adopted The metrics for 120588119894 and 120575119894may be inconsistent which directly leads to 120588119894 and 120575119894 playingdifferent roles in the calculation of the decision graph Henceit is necessary to normalize 120588119894 and 120575119894

The specific calculation method is as follows First thelocal density of data point x119894 is calculated1199081015840119894 = 119899sum

119895=1119895 =119894

110038171003817100381710038171003817x119895 minus x11989410038171003817100381710038171003817 (10)

where x119894 and x119895 are data points and 1199081015840119894 is the density of datapoint x119894 Next the density of data point x119894 is normalized andthe normalized density is denoted as 119908119894119908119894 = 1199081015840119894sum119899119894=1 1199081015840119894 (11)

Mathematical Problems in Engineering 5

Finally the density metric which uses the idea of the K-nearest neighbor method is defined as120588119894 = minus 1(1119870)sum119894isin119870119873119873119908119894 log (119908119894) (12)

To guarantee the consistence of the metrics of 120588119894 and 120575119894120575119894 also needs to be normalized

32 Tissue-Like P System with Active Membranes for ImprovedDensity Peak Clustering In the following a tissue-like Psystem with active membranes for density peak clusteringcalled KST-DPC is proposed As mentioned before assumethe dataset with 119899 data points is represented by 119883 =x1 x2 x119899 Before performing any specific calculation ofthe DPC algorithm the Euclidean distance between each pairof data points in the dataset is calculated and the result isstored in the form of a matrix The initial configuration ofthis P system is shown in Figure 3

When the system is initialized the objects x119894 b1b2 b119899 are in membrane 119894 for 1 le 119894 le 119899 and object 120582 isin membrane 119899 + 1 where 120582 means there is no object Firstthe Euclidean distance 119908119894119895 between the data points x119894 and x119895(represented by b119895 for 1 le 119895 le 119899) is calculated with the rule1199031 = [x119894b1b2 sdot sdot sdot b119899]119894 997888rarr [11988911990811989411198941 11988911990811989421198942 sdot sdot sdot 119889119908119894n119894n ]119894 | 1 le 119894 le 119899Note that x119895 for 1 le 119895 le 119899 are expressed as b1b2 sdot sdot sdot b119899The results are stored as the distance matrix also called thedissimilarity matrix119863119899119899

119863119899119899 =(11990811 11990812 sdot sdot sdot 119908111989911990821 11990822 sdot sdot sdot 1199082119899sdot sdot sdot1199081198991 1199081198992 sdot sdot sdot 119908119899119899) (13)

At the beginning there are 119899 + 1 membranes inthe P system After the distances are calculated objects119909119894 1198891199081198941i1 1198891199081198942i2 sdot sdot sdot 119889119908i119899in are placed in membrane 119894 for 1 le 119894 le 119899In the next step the densities of the data points are calculatedby the rule 1199032 = [11988911990811989411198941 11988911990811989421198942 sdot sdot sdot 119889119908119894n119894n ]119894 997888rarr [1199081015840119894 ]119894 | 1 le 119894 le 119899Then the send-in and send-out communication rules are usedto calculate the values of 120588119894 120575119894 and 120574119894 and to put them inmembrane 119894 for 1 le 119894 le 119899 Next according to the sortedresults of 120574119894 for 1 le 119894 le 119899 the number of clusters 119896 can bedeterminedThe rule of the active membranes is used to splitmembrane 119899+1 into 119896membranes as shown in Figure 4The 119896cluster centers are put inmembranes 119899+1 to 119899+119896 respectivelyFinally the remaining data points are divided and each is putinto a membrane with a cluster center that is closest to thedata point Up to this point the clusters are obtained

The main steps of KST-DPC is summarized asin Algorithm 1

33 Time Complexity Analysis of KST-DPC As usual com-putations in the cells in the tissue-like P system can be imple-mented in parallel Because of the parallel implementationthe generation of the dissimilarity matrix uses 119899 computationsteps The generation of the data points densities needs 1

Table 1 Synthetic datasets

Dataset Instances Dimensions ClustersSpiral 312 2 3Compound 7266 2 6Jain 373 2 2Aggregation 788 2 7R15 600 2 15D31 3100 2 31

computation step The calculation of the final density 120588119894 uses119896 computation steps The calculation of 120575119894 needs 119899 steps Thecalculation of 120574119894 uses 1 step 119899 log 119899 steps are used to sort 120574119894for 1 le 119894 le 119899 Finally the final clustering needs 1 morecomputation step Therefore the total time complexity ofKST-DPC is 119899 + 1 + 119896 + 119899 + 1 + 119899 log 119899 + 1 = 119874(119899 log 119899)The time complexity of the DPC-KNN is119874(1198992) As comparedto DPC-KNN KST-DPC reduces the time complexity bytransferring time complexity to space complexity The aboveanalysis demonstrates that the overall time complexity ofKST-DPC is superior to that of DPC-KNN

4 Test and Analysis

41 Data Sources Experiments on six synthetic datasetsand four real-world datasets are carried out to test theperformance of KST-DPC The synthetic datasets are fromhttpcsueffisipudatasets These datasets are commonlyused as benchmarks to test the performance of clusteringalgorithms The real-world datasets used in the experimentsare from the UCI Machine Learning Repository [31] Thesedatasets are chosen to test the ability of KST-DPC in identi-fying clusters having arbitrary shapes without being affectedby noise size or dimensions of the datasets The numbersof features (dimensions) data points (instances) and clustersvary in each of the datasets The details of the synthetic andreal-world datasets are listed in Tables 1 and 2 respectively

The performance of KST-DPC was compared with thoseof the well-known clustering algorithms SC [32] DBSCAN[33] andDPC-KNN [28 34]The codes for SC andDBSCANare provided by their authors The code of DPC is optimizedby using the matrix operation instead of iteration cycle basedon the original code provided by Rodriguez and Laio [3] toreduce running time

The performances of the above clustering algorithmsare measured in clustering quality or Accuracy (Acc) andNormalized Mutual Information (NMI) They are very pop-ular measures for testing the performance of clusteringalgorithmsThe larger the values are the better the results areThe upper bound of these measures is 1

42 Experimental Results on the Synthetic Datasets In thissubsection the performances of KST-DPC DPC-KNNDBSCAN and SC are reported on the six synthetic datasetsThe clustering results by the four clustering algorithms for thesix synthetic datasets are color coded and displayed in two-dimensional spaces as shown in Figures 5ndash10 The results of

6 Mathematical Problems in Engineering

1 2 n

n+1

1 2 n

n+1

x1 b1 b2 middot middot middot bn x2 b1 b2 middot middot middot bn xn b1 b2 middot middot middot bn

Figure 3 The initial configuration of the tissue-like P system

Inputs dataset X parameter KOutput ClustersStep 1 The objects x119894 b1 b2 b119899 are in membrane 119894 for 1 le 119894 le 119899

and object 120582 is in membrane 119899 + 1Step 2 Compute the Euclidean distance matrix 119908119894119895 by the rule1Step 3 Compute the local densities of the data points by the rule2 and

normalize them using (10) and (11)Step 4 Calculate 120588119894 and 120575119894 for data point 119894 using (12) and (4) in every

membrane 119894 respectivelyStep 5 Calculate 120574119894 = 120588119894 times 120575119894 for all 1 le 119894 le 119899 in membrane 119894 and sort them

by descend and select the top K values as the initial cluster center So as todetermine the centers of the clusters

Step 6 Split the membrane 119899 + 1 to K membranes by the division rules which membranes canbe number from 119899 + 1 to 119899 + 119896

Step 7 The119870 clustering centers are put in membranes 119899 + 1 to 119899 + 119896 respectivelyStep 8 Assign each remaining point to the membrane with the nearest cluster centerStep 9 Return the clustering result

Algorithm 1

Table 2 Real-world datasets

Dataset Instances Dimensions ClustersVertebral 310 7 2Seeds 210 7 3Breast cancer 699 10 2Banknotes 1372 5 2

the four clustering algorithms on a dataset are shown as fourparts in a single figure The cluster centers of the KST-DPCand DPC-KNN algorithms are marked in the figures withdifferent colors For DBSCAN it is not meaningful to markthe cluster centers because they are chosen randomly Eachclustering algorithm ran multiple times on each dataset andthe best result of each clustering algorithm is displayed

The performance measures of the four clustering algo-rithms on the six synthetic datasets are reported in Table 3 InTable 3 the column ldquoParrdquo for each algorithm is the number ofparameters the users need to set KST-DPC and DPC-KNNhave only one parameter K which is the number of nearestneighbors to be prespecified In this paper the value of K isdetermined by the percentage of the data points It referencesthemethod in [34] For each dataset we adjust the percentage

of data points in the KNN for the multiple times and findthe optimal percentage that can make the final clusteringreach the best Because we perform more experiments weonly list the best result in Tables 3 and 4 And in order tobe consistent wi other parameters in the table we directlyconvert the percentage of data points into specific K valuesDBSCANhas two input parameters themaximum radiusEpsand the minimum pointMinPts The SC algorithm needs thetrue number of clusters C1 in Table 3 refers to the numberof cluster centers found by the algorithms The performancemeasures including Acc and NMI are presented in Table 3 forthe four clustering algorithms on the six synthetic datasets

The Spiral dataset has 3 clusters with 312 data pointsembracing each other Table 3 and Figure 5 show that KST-DPC DPC-KNN DBSCAN and SC can all find the correctnumber of clusters and get the correct clustering results Allthe benchmark values are 100 reflecting the four algorithmsall performing perfectly well on the Spiral dataset

The Compound dataset has 6 clusters with 399 datapoints From Table 3 and Figure 6 it is obvious that KST-DPC can find the ideal clustering result DBSCAN cannotfind the right clusters whereas DPC-KNN and SC cannot findthe clustering centers Because DPC has a special assignmentstrategy [3] it may assign data points erroneously to clusters

Mathematical Problems in Engineering 7

Table 3 Results on the synthetic datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMISpiral CompoundKST-DPC 16 3 100 100 KST-DPC 217 6 098 095DPC-KNN 20 3 100 100 DPC-KNN 360 6 06466 07663DBSCAN 123 3 100 100 DBSCAN 153 5 08596 09429SC 3 3 100 100 SC 6 6 06015 07622Jain AggregationKST-DPC 4 2 100 100 KST-DPC 40 7 100 100DPC-KNN 8 2 09035 05972 DPC-KNN 40 7 09987 09957DBSCAN 2624 2 100 100 DBSCAN 1593 5 08274 08894SC 2 2 100 100 SC 7 7 09937 09824R15 D31KST-DPC 20 15 100 099 KST-DPC 25 31 10000 10000DPC-KNN 20 15 100 099 DPC-KNN 25 31 09700 09500DBSCAN 045 13 078 09155 DBSCAN 0463 27 06516 08444SC 15 15 09967 09942 SC 31 31 09765 09670

x1 dw1111 d

w1212 middot middot middot d

w1

1H

1 1 1

x2 dw2121 d

w2222 middot middot middot d

w2

2H

2 2 2

xn dw1H1 d

w2H2 middot middot middot d

wHHH

n n n1 2 n

n+1 n+2 n+k

Figure 4 The tissue-like membrane system in the calculation process

once a data point with a higher density is assigned to anincorrect cluster For this reason some data points belongingto cluster 1 are incorrectly assigned to cluster 2 or 3 asshown in Figures 6(b)ndash6(d) DBSCAN has some prespecifiedparameters that can have heavy effects on the clusteringresults As shown in Figure 6(c) two clusters are mergedinto one cluster in two occasions KST-DPC obtained Accand NMI values higher than those obtained by the otheralgorithms

The Jain dataset has two clusters with 373 data points ina 2 dimensional space The clustering results show that KST-DPC DBSCAN and SC can get correct results and both ofthe benchmark values are 100 The experimental results ofthe 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 7 DPC-KNN devides somepoints that should belong to the bottom cluster into the uppercluster Although all the four clustering algorithms can find

the correct number of clusters KST-DPC DBSCAN and SCaremore effective because they can put all the data points intothe correct clusters

The Aggregation dataset has 7 clusters with differentsizes and shapes and two pairs of clusters connected to eachother Figure 8 shows that both the KST-DPC and DPC-KNN algorithms can effectively find the cluster centers andcorrect clusters except that an individual data point is putinto an incorrect cluster byDPC-KNN Table 3 shows that thebenchmark values of KST-DPC are all 100 and those of DPC-KNN are close to 100 SC also can recognize all clusters butthe values ofAcc andNMI are lower than those ofDPC-KNNDBSCAN did not find all clusters and could not partition theclusters connected to each other

TheR15 dataset has 15 clusters containing 600 data pointsThe clusters are slightly overlapping and are distributedrandomly in a 2-dimensional space One cluster lays in the

8 Mathematical Problems in Engineering

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(a) KST-DPC0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(b) DPC-KNN

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(c) DBSCAN0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(d) SC

Figure 5 Clustering results of the Spiral dataset by the four clustering algorithms

center of the 2-dimensional space and is closely surroundedby seven other clusters The experimental results of the 4algorithms are shown in Table 3 and the clustering resultsare displayed in Figure 9 KST-DPC and DPC-KNN can bothfind the correct cluster centers and assign almost all datapoints to their corresponding clusters SC also obtained goodexperimental result but DBSCAN did not find all clusters

The D31 dataset has 31 clusters and contains 3100 datapoints These clusters are slightly overlapping and distributerandomly in a 2-dimensional spaceThe experimental resultsof the 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 10 The values of Acc and NMIobtained by KST-DPC are all 100 This shows that KST-DPCobtained perfect clustering results on the D31 dataset DPCand SC obtained similar results to those of KST-DPC on thisdataset but DBSCAN was not able to find all clusters

43 Experimental Results on the Real-World Datasets Thissubsection reports the performances of the clustering algo-rithms on the four real-world datasets The varying sizesand dimensions of these datasets are useful in testing theperformance of the algorithms under different conditions

The number of clusters Acc and NMI are also usedto measure the performances of the clustering algorithmson these real-world datasets The experimental results are

reported in Table 4 and the best results of the each dataset areshown in italicThe symbol ldquo--rdquo indicates there is no value forthat entry

The Vertebral dataset consists of 2 clusters and 310 datapoints As Table 4 shows the value of Acc got by KST-DPCis equal to that got by DPC-KNN but the value of NMI gotby KST-DPC is lower than that got by DPC-KNN No valuesof Acc and NMI were obtained by SC As Table 4 shows allalgorithms could find the right number of clusters

The Seeds dataset consists of 210 data points and 3clusters Results in Table 4 show that KST-DPC obtained thebest whereas DBSCAN obtained the worst values of Acc andNMI It is obvious that all four clustering algorithms could getthe right number of clusters

The Breast Cancer dataset consists of 699 data points and2 clusters The results on this dataset in Table 4 show thatall four clustering algorithms could find the right numberof clusters KST-DPC obtained the Acc and NMI values of08624 and 04106 respectively which are higher than thoseobtained by other clustering algorithmsThe results also showthat DBSCAN has the worst performance on this datasetexcept that SC did not get experimental results on thesebenchmarks

The Banknotes dataset consists of 1372 data points and 2clusters FromTable 4 it is obvious that KST-DPCgot the best

Mathematical Problems in Engineering 9

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(a) KST-DPC5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(b) DPC-KNN

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(c) DBSCAN5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(d) SC

Figure 6 Clustering results of the Compound dataset by the four clustering algorithms

Table 4 Results on the real-world datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMIVertebral SeedsKST-DPC 9 2 06806 00313 KST-DPC 4 3 08429 06574DPC-KNN 9 2 06806 00821 DPC-KNN 6 3 08143 06252DBSCAN 748 2 06742 -- DBSCAN 0927 3 05857 04835SC 2 2 -- -- SC 3 3 06071 05987Breast cancer BanknotesKST-DPC 70 2 08624 04106 KST-DPC 68 2 08434 07236DPC-KNN 76 2 07954 03154 DPC-KNN 82 2 07340 03311DBSCAN 620 2 06552 00872 DBSCAN 655 2 05554 67210e-16SC 2 2 -- -- SC 2 2 06152 00598

values of Acc and NMI among all four clustering algorithmsThe values of Acc and NMI obtained by KST-DPC are 08434and 07260 respectively Larger values of these benchmarksindicate that the experimental results obtained by KST-DPCare closer to the true results than those obtained by the otherclustering algorithms

All these experimental results show that KST-DPC out-perform the other clustering algorithms It obtained largervalues of Acc and NMI than the other clustering algorithms

5 Conclusion

This study proposed a density peak clustering algorithmbased on the K-nearest neighbors Shannon entropy andtissue-like P systems It uses the K-nearest neighbors andShannon entropy to calculate the density metric This algo-rithm overcomes the shortcoming that DPC has that is to setthe value of the cutoff distance 119889119888 in advance The tissue-likeP system is used to realize the clustering processThe analysisdemonstrates that the overall time taken by KST-DPC is

10 Mathematical Problems in Engineering

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(d) SC

Figure 7 Clustering results of the Jain dataset by the four clustering algorithms

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(d) SC

Figure 8 Clustering results of the Aggregation dataset by the four clustering algorithms

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 3: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

Mathematical Problems in Engineering 3

The way in (5) is suitable for ldquosmallrdquo datasets In fact it isdifficult to judge if the dataset is small or large When (5) isused to calculate the local density the results can be greatlyaffected by the cutoff distance 119889119888

Each component on the right side of (5) is a Gaussianfunction Figure 1 visualizes the function exp(minus119905) and twoGaussian functions exp(minus11990521205902) with different values of 120590The blue and red curves are the curves of exp(minus11990521205902) with120590 = 1 and 120590 = radic2 respectively The curve with a smallervalue of 120590 declines more quickly than the curve with a largervalue of 120590 Comparing the curve of exp(minus119905) the yellow dashdotted curve with the curves of exp(minus11990521205902) it can be foundthat values of exp(minus11990521205902) are greater than those of exp(minus119905)when 119905 lt 1205902 but decay faster when 119905 gt 1205902 This means that ifthe value of the parameter 120590 needs to be decided manually inthe density calculation the result of the calculated densitieswill be influenced by the selected value This analysis showsthat the parameter 120590 has a big effect on the calculated resultsFurthermore the density in (5) can be influenced by thecutoff distance 119889119888 To eliminate the influence from the cutoffdistance 119889119888 and give a uniform metric for datasets with anysize Du et al [30] proposed the K-nearest neighbor methodThe local density in Du et al [30] is given by

120588119894 = exp(minus 1119870 sum119895isin119870119873119873119894

1198892119894119895) (6)

where 119870 is an input parameter and 119870119873119873119894 is the set of the 119870nearest neighbors of data point x119894 However this method didnot consider the influence of the position of the data point onits own densityTherefore this current study proposes a novelmethod to calculate the densities

22 e Tissue-Like P System with Active Membrane Atissue-like P system has a graphical structure The nodesof the graph correspond to the cells and the environmentin the tissue-like P system whereas the edges of the graphrepresent the channels for communication between the cellsThe tissue-like P system is slightly more complicated than thecell-like P system Each cell has a different state Only thestate that meets the requirement specified by the rules can bechangedThebasic framework of the tissue-like P systemusedin this study is shown in Figure 2

A P system with active membranes is a constructprod = (119874119885119867 1205961 120596119898 119864 119888ℎ (119904(119894119895))(119894119895)isin119888ℎ (119877(119894119895))(119894119895)isin119888ℎ 1198940) (7)

where

(1) 119874 is the set of alphabets of all objects which appear inthe system

(2) 119885 represents the states of the alphabets(3) 119867 is the set of labels of the membranes(4) 1205961 120596119898 are the initial multiple objects in cells 1 to

m

(5) 119864 sube 119874 is the set of objects present in an arbitrarynumber of copies in the environment

(6) 119888ℎ sube (119894 119895) | 119894 119895 isin 0 1 119898 119894 = 119895 is the setof channels between cells and between cells and theenvironment

(7) 119904(119894119895) is the initial state of the channel (i j)(8) 119877(119894119895) is a finite set of symportantiport rules of the

form (119904 119909119910 1199041015840) with 119904 1199041015840 isin 119885 and 119909 y isin 119874(i) [a 997888rarr b]ℎ where ℎ isin 119867 119886 isin 119874 and 119887 isin 119874

(Object evolution rules an object is evolved intoanother in a membrane)

(ii) a[]ℎ 997888rarr [b]ℎ where ℎ isin 119867 and 119886 119887 isin 119874(Send-in communication rules an object isintroduced into a membrane and may be modi-fied during the process)

(iii) [a]ℎ 997888rarr []ℎ119887 where ℎ isin 119867 119886 119887 isin 119874(Send-out communication rules an object issent out of the membrane and may be modifiedduring the process)

(iv) [a]ℎ 997888rarr [b]ℎ1[c]ℎ2 where ℎ ℎ1 ℎ2 isin 119867 and119886 119887 isin 119874 (Division rules for elementary mem-branes themembrane is divided into twomem-branes with possibly different labels the objectspecified in the rule is replaced by possibly newobjects in the two new membranes and theremaining objects are duplicated in the process)

(9) 1198940 isin 1 119898 is the output cellThe biggest difference between a cell-like P system and a

tissue-like P system is that each cell can communicate withthe environment in the tissue-like P system but only the skinmembrane can communicate with the environment in thecell-like P system This does not mean that any two cells inthe tissue-like P system can communicate with each otherIf there is no direct communication channel between thetwo cells they can communicate through the environmentindirectly

3 The Proposed Method

31 Density Metric Based on the K-Nearest Neighbors andShannon Entropy DPC still has some defects The currentDPC algorithm has the obvious shortcoming that it needs toset the value of the cutoff distance 119889119888 manually in advanceThis value will largely affect the final clustering resultsIn order to overcome this shortcoming a new method isproposed to calculate the density metric based on the K-nearest neighbors and Shannon entropy

K-nearest neighbors (KNN) is usually used to measurea local neighborhood of an instance in the fields of clas-sification clustering local outlier detection etc The aimof this approach is to find the K-nearest neighbors of asample among N samples In general the distances betweenpoints are achieved by calculating the Euclidean distance Let

4 Mathematical Problems in Engineering

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07

08

09

1

exp(-N22) =1exp(-N22) =sqrt(2)exp(-t)

Figure 1 Three different function curves

1 2 n

n+1

Figure 2 Membrane structure of a tissue-like P system

KNN(119894) be a set of nearest neighbors of a point 119894 and it can beexpressed as 119870119873119873(119894) = 119895 | 119889119894119895 le 119889119894119896119905ℎ(i) (8)

where 119889(x119894 x119895) is the Euclidean distance between x119894 and x119895and 119896119905ℎ(119894) is the k-th nearest neighbor of 119894 Local regionsmeasured byKNNare often termedK-nearest neighborhoodwhich in fact is a circular or spherical area or radius119877 = 119889119894119896119905ℎ(119894) Therefore KNN-based method cannot applyto handle datasets with clusters nonspherical distributionsTherefore these methods usually have poor clustering resultswhen handling datasets with clusters of different shapes

Shannon entropymeasures the degree ofmolecular activ-ity The more unstable the system is the larger the value ofthe Shannon entropy is and vice versaThe Shannon entropyrepresented by119867(119883) is given by119867(119883) = minus 119873sum

119894=0

119901119894 log2 (119901119894) (9)

where119883 is the set of objects and 119901119894 is the probability of object119894 appearing in119883 When119867(119883) is used tomeasure the distancebetween the clusters the smaller the value of 119867(119883) is thebetter the clustering result isTherefore the Shannon entropy

is introduced to calculate the data point density in the K-nearest neighbormethod so that the final density calculationnot only considers the distance metric but also adds theinfluence of the data point position to the density of the datapoint

However the decision graph is calculated by the productof 120588119894 and 120575119894 A larger value of 120588119894 makes it easier to choosethe best clustering centers Therefore the reciprocal form ofthe Shannon entropy is adopted The metrics for 120588119894 and 120575119894may be inconsistent which directly leads to 120588119894 and 120575119894 playingdifferent roles in the calculation of the decision graph Henceit is necessary to normalize 120588119894 and 120575119894

The specific calculation method is as follows First thelocal density of data point x119894 is calculated1199081015840119894 = 119899sum

119895=1119895 =119894

110038171003817100381710038171003817x119895 minus x11989410038171003817100381710038171003817 (10)

where x119894 and x119895 are data points and 1199081015840119894 is the density of datapoint x119894 Next the density of data point x119894 is normalized andthe normalized density is denoted as 119908119894119908119894 = 1199081015840119894sum119899119894=1 1199081015840119894 (11)

Mathematical Problems in Engineering 5

Finally the density metric which uses the idea of the K-nearest neighbor method is defined as120588119894 = minus 1(1119870)sum119894isin119870119873119873119908119894 log (119908119894) (12)

To guarantee the consistence of the metrics of 120588119894 and 120575119894120575119894 also needs to be normalized

32 Tissue-Like P System with Active Membranes for ImprovedDensity Peak Clustering In the following a tissue-like Psystem with active membranes for density peak clusteringcalled KST-DPC is proposed As mentioned before assumethe dataset with 119899 data points is represented by 119883 =x1 x2 x119899 Before performing any specific calculation ofthe DPC algorithm the Euclidean distance between each pairof data points in the dataset is calculated and the result isstored in the form of a matrix The initial configuration ofthis P system is shown in Figure 3

When the system is initialized the objects x119894 b1b2 b119899 are in membrane 119894 for 1 le 119894 le 119899 and object 120582 isin membrane 119899 + 1 where 120582 means there is no object Firstthe Euclidean distance 119908119894119895 between the data points x119894 and x119895(represented by b119895 for 1 le 119895 le 119899) is calculated with the rule1199031 = [x119894b1b2 sdot sdot sdot b119899]119894 997888rarr [11988911990811989411198941 11988911990811989421198942 sdot sdot sdot 119889119908119894n119894n ]119894 | 1 le 119894 le 119899Note that x119895 for 1 le 119895 le 119899 are expressed as b1b2 sdot sdot sdot b119899The results are stored as the distance matrix also called thedissimilarity matrix119863119899119899

119863119899119899 =(11990811 11990812 sdot sdot sdot 119908111989911990821 11990822 sdot sdot sdot 1199082119899sdot sdot sdot1199081198991 1199081198992 sdot sdot sdot 119908119899119899) (13)

At the beginning there are 119899 + 1 membranes inthe P system After the distances are calculated objects119909119894 1198891199081198941i1 1198891199081198942i2 sdot sdot sdot 119889119908i119899in are placed in membrane 119894 for 1 le 119894 le 119899In the next step the densities of the data points are calculatedby the rule 1199032 = [11988911990811989411198941 11988911990811989421198942 sdot sdot sdot 119889119908119894n119894n ]119894 997888rarr [1199081015840119894 ]119894 | 1 le 119894 le 119899Then the send-in and send-out communication rules are usedto calculate the values of 120588119894 120575119894 and 120574119894 and to put them inmembrane 119894 for 1 le 119894 le 119899 Next according to the sortedresults of 120574119894 for 1 le 119894 le 119899 the number of clusters 119896 can bedeterminedThe rule of the active membranes is used to splitmembrane 119899+1 into 119896membranes as shown in Figure 4The 119896cluster centers are put inmembranes 119899+1 to 119899+119896 respectivelyFinally the remaining data points are divided and each is putinto a membrane with a cluster center that is closest to thedata point Up to this point the clusters are obtained

The main steps of KST-DPC is summarized asin Algorithm 1

33 Time Complexity Analysis of KST-DPC As usual com-putations in the cells in the tissue-like P system can be imple-mented in parallel Because of the parallel implementationthe generation of the dissimilarity matrix uses 119899 computationsteps The generation of the data points densities needs 1

Table 1 Synthetic datasets

Dataset Instances Dimensions ClustersSpiral 312 2 3Compound 7266 2 6Jain 373 2 2Aggregation 788 2 7R15 600 2 15D31 3100 2 31

computation step The calculation of the final density 120588119894 uses119896 computation steps The calculation of 120575119894 needs 119899 steps Thecalculation of 120574119894 uses 1 step 119899 log 119899 steps are used to sort 120574119894for 1 le 119894 le 119899 Finally the final clustering needs 1 morecomputation step Therefore the total time complexity ofKST-DPC is 119899 + 1 + 119896 + 119899 + 1 + 119899 log 119899 + 1 = 119874(119899 log 119899)The time complexity of the DPC-KNN is119874(1198992) As comparedto DPC-KNN KST-DPC reduces the time complexity bytransferring time complexity to space complexity The aboveanalysis demonstrates that the overall time complexity ofKST-DPC is superior to that of DPC-KNN

4 Test and Analysis

41 Data Sources Experiments on six synthetic datasetsand four real-world datasets are carried out to test theperformance of KST-DPC The synthetic datasets are fromhttpcsueffisipudatasets These datasets are commonlyused as benchmarks to test the performance of clusteringalgorithms The real-world datasets used in the experimentsare from the UCI Machine Learning Repository [31] Thesedatasets are chosen to test the ability of KST-DPC in identi-fying clusters having arbitrary shapes without being affectedby noise size or dimensions of the datasets The numbersof features (dimensions) data points (instances) and clustersvary in each of the datasets The details of the synthetic andreal-world datasets are listed in Tables 1 and 2 respectively

The performance of KST-DPC was compared with thoseof the well-known clustering algorithms SC [32] DBSCAN[33] andDPC-KNN [28 34]The codes for SC andDBSCANare provided by their authors The code of DPC is optimizedby using the matrix operation instead of iteration cycle basedon the original code provided by Rodriguez and Laio [3] toreduce running time

The performances of the above clustering algorithmsare measured in clustering quality or Accuracy (Acc) andNormalized Mutual Information (NMI) They are very pop-ular measures for testing the performance of clusteringalgorithmsThe larger the values are the better the results areThe upper bound of these measures is 1

42 Experimental Results on the Synthetic Datasets In thissubsection the performances of KST-DPC DPC-KNNDBSCAN and SC are reported on the six synthetic datasetsThe clustering results by the four clustering algorithms for thesix synthetic datasets are color coded and displayed in two-dimensional spaces as shown in Figures 5ndash10 The results of

6 Mathematical Problems in Engineering

1 2 n

n+1

1 2 n

n+1

x1 b1 b2 middot middot middot bn x2 b1 b2 middot middot middot bn xn b1 b2 middot middot middot bn

Figure 3 The initial configuration of the tissue-like P system

Inputs dataset X parameter KOutput ClustersStep 1 The objects x119894 b1 b2 b119899 are in membrane 119894 for 1 le 119894 le 119899

and object 120582 is in membrane 119899 + 1Step 2 Compute the Euclidean distance matrix 119908119894119895 by the rule1Step 3 Compute the local densities of the data points by the rule2 and

normalize them using (10) and (11)Step 4 Calculate 120588119894 and 120575119894 for data point 119894 using (12) and (4) in every

membrane 119894 respectivelyStep 5 Calculate 120574119894 = 120588119894 times 120575119894 for all 1 le 119894 le 119899 in membrane 119894 and sort them

by descend and select the top K values as the initial cluster center So as todetermine the centers of the clusters

Step 6 Split the membrane 119899 + 1 to K membranes by the division rules which membranes canbe number from 119899 + 1 to 119899 + 119896

Step 7 The119870 clustering centers are put in membranes 119899 + 1 to 119899 + 119896 respectivelyStep 8 Assign each remaining point to the membrane with the nearest cluster centerStep 9 Return the clustering result

Algorithm 1

Table 2 Real-world datasets

Dataset Instances Dimensions ClustersVertebral 310 7 2Seeds 210 7 3Breast cancer 699 10 2Banknotes 1372 5 2

the four clustering algorithms on a dataset are shown as fourparts in a single figure The cluster centers of the KST-DPCand DPC-KNN algorithms are marked in the figures withdifferent colors For DBSCAN it is not meaningful to markthe cluster centers because they are chosen randomly Eachclustering algorithm ran multiple times on each dataset andthe best result of each clustering algorithm is displayed

The performance measures of the four clustering algo-rithms on the six synthetic datasets are reported in Table 3 InTable 3 the column ldquoParrdquo for each algorithm is the number ofparameters the users need to set KST-DPC and DPC-KNNhave only one parameter K which is the number of nearestneighbors to be prespecified In this paper the value of K isdetermined by the percentage of the data points It referencesthemethod in [34] For each dataset we adjust the percentage

of data points in the KNN for the multiple times and findthe optimal percentage that can make the final clusteringreach the best Because we perform more experiments weonly list the best result in Tables 3 and 4 And in order tobe consistent wi other parameters in the table we directlyconvert the percentage of data points into specific K valuesDBSCANhas two input parameters themaximum radiusEpsand the minimum pointMinPts The SC algorithm needs thetrue number of clusters C1 in Table 3 refers to the numberof cluster centers found by the algorithms The performancemeasures including Acc and NMI are presented in Table 3 forthe four clustering algorithms on the six synthetic datasets

The Spiral dataset has 3 clusters with 312 data pointsembracing each other Table 3 and Figure 5 show that KST-DPC DPC-KNN DBSCAN and SC can all find the correctnumber of clusters and get the correct clustering results Allthe benchmark values are 100 reflecting the four algorithmsall performing perfectly well on the Spiral dataset

The Compound dataset has 6 clusters with 399 datapoints From Table 3 and Figure 6 it is obvious that KST-DPC can find the ideal clustering result DBSCAN cannotfind the right clusters whereas DPC-KNN and SC cannot findthe clustering centers Because DPC has a special assignmentstrategy [3] it may assign data points erroneously to clusters

Mathematical Problems in Engineering 7

Table 3 Results on the synthetic datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMISpiral CompoundKST-DPC 16 3 100 100 KST-DPC 217 6 098 095DPC-KNN 20 3 100 100 DPC-KNN 360 6 06466 07663DBSCAN 123 3 100 100 DBSCAN 153 5 08596 09429SC 3 3 100 100 SC 6 6 06015 07622Jain AggregationKST-DPC 4 2 100 100 KST-DPC 40 7 100 100DPC-KNN 8 2 09035 05972 DPC-KNN 40 7 09987 09957DBSCAN 2624 2 100 100 DBSCAN 1593 5 08274 08894SC 2 2 100 100 SC 7 7 09937 09824R15 D31KST-DPC 20 15 100 099 KST-DPC 25 31 10000 10000DPC-KNN 20 15 100 099 DPC-KNN 25 31 09700 09500DBSCAN 045 13 078 09155 DBSCAN 0463 27 06516 08444SC 15 15 09967 09942 SC 31 31 09765 09670

x1 dw1111 d

w1212 middot middot middot d

w1

1H

1 1 1

x2 dw2121 d

w2222 middot middot middot d

w2

2H

2 2 2

xn dw1H1 d

w2H2 middot middot middot d

wHHH

n n n1 2 n

n+1 n+2 n+k

Figure 4 The tissue-like membrane system in the calculation process

once a data point with a higher density is assigned to anincorrect cluster For this reason some data points belongingto cluster 1 are incorrectly assigned to cluster 2 or 3 asshown in Figures 6(b)ndash6(d) DBSCAN has some prespecifiedparameters that can have heavy effects on the clusteringresults As shown in Figure 6(c) two clusters are mergedinto one cluster in two occasions KST-DPC obtained Accand NMI values higher than those obtained by the otheralgorithms

The Jain dataset has two clusters with 373 data points ina 2 dimensional space The clustering results show that KST-DPC DBSCAN and SC can get correct results and both ofthe benchmark values are 100 The experimental results ofthe 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 7 DPC-KNN devides somepoints that should belong to the bottom cluster into the uppercluster Although all the four clustering algorithms can find

the correct number of clusters KST-DPC DBSCAN and SCaremore effective because they can put all the data points intothe correct clusters

The Aggregation dataset has 7 clusters with differentsizes and shapes and two pairs of clusters connected to eachother Figure 8 shows that both the KST-DPC and DPC-KNN algorithms can effectively find the cluster centers andcorrect clusters except that an individual data point is putinto an incorrect cluster byDPC-KNN Table 3 shows that thebenchmark values of KST-DPC are all 100 and those of DPC-KNN are close to 100 SC also can recognize all clusters butthe values ofAcc andNMI are lower than those ofDPC-KNNDBSCAN did not find all clusters and could not partition theclusters connected to each other

TheR15 dataset has 15 clusters containing 600 data pointsThe clusters are slightly overlapping and are distributedrandomly in a 2-dimensional space One cluster lays in the

8 Mathematical Problems in Engineering

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(a) KST-DPC0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(b) DPC-KNN

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(c) DBSCAN0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(d) SC

Figure 5 Clustering results of the Spiral dataset by the four clustering algorithms

center of the 2-dimensional space and is closely surroundedby seven other clusters The experimental results of the 4algorithms are shown in Table 3 and the clustering resultsare displayed in Figure 9 KST-DPC and DPC-KNN can bothfind the correct cluster centers and assign almost all datapoints to their corresponding clusters SC also obtained goodexperimental result but DBSCAN did not find all clusters

The D31 dataset has 31 clusters and contains 3100 datapoints These clusters are slightly overlapping and distributerandomly in a 2-dimensional spaceThe experimental resultsof the 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 10 The values of Acc and NMIobtained by KST-DPC are all 100 This shows that KST-DPCobtained perfect clustering results on the D31 dataset DPCand SC obtained similar results to those of KST-DPC on thisdataset but DBSCAN was not able to find all clusters

43 Experimental Results on the Real-World Datasets Thissubsection reports the performances of the clustering algo-rithms on the four real-world datasets The varying sizesand dimensions of these datasets are useful in testing theperformance of the algorithms under different conditions

The number of clusters Acc and NMI are also usedto measure the performances of the clustering algorithmson these real-world datasets The experimental results are

reported in Table 4 and the best results of the each dataset areshown in italicThe symbol ldquo--rdquo indicates there is no value forthat entry

The Vertebral dataset consists of 2 clusters and 310 datapoints As Table 4 shows the value of Acc got by KST-DPCis equal to that got by DPC-KNN but the value of NMI gotby KST-DPC is lower than that got by DPC-KNN No valuesof Acc and NMI were obtained by SC As Table 4 shows allalgorithms could find the right number of clusters

The Seeds dataset consists of 210 data points and 3clusters Results in Table 4 show that KST-DPC obtained thebest whereas DBSCAN obtained the worst values of Acc andNMI It is obvious that all four clustering algorithms could getthe right number of clusters

The Breast Cancer dataset consists of 699 data points and2 clusters The results on this dataset in Table 4 show thatall four clustering algorithms could find the right numberof clusters KST-DPC obtained the Acc and NMI values of08624 and 04106 respectively which are higher than thoseobtained by other clustering algorithmsThe results also showthat DBSCAN has the worst performance on this datasetexcept that SC did not get experimental results on thesebenchmarks

The Banknotes dataset consists of 1372 data points and 2clusters FromTable 4 it is obvious that KST-DPCgot the best

Mathematical Problems in Engineering 9

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(a) KST-DPC5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(b) DPC-KNN

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(c) DBSCAN5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(d) SC

Figure 6 Clustering results of the Compound dataset by the four clustering algorithms

Table 4 Results on the real-world datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMIVertebral SeedsKST-DPC 9 2 06806 00313 KST-DPC 4 3 08429 06574DPC-KNN 9 2 06806 00821 DPC-KNN 6 3 08143 06252DBSCAN 748 2 06742 -- DBSCAN 0927 3 05857 04835SC 2 2 -- -- SC 3 3 06071 05987Breast cancer BanknotesKST-DPC 70 2 08624 04106 KST-DPC 68 2 08434 07236DPC-KNN 76 2 07954 03154 DPC-KNN 82 2 07340 03311DBSCAN 620 2 06552 00872 DBSCAN 655 2 05554 67210e-16SC 2 2 -- -- SC 2 2 06152 00598

values of Acc and NMI among all four clustering algorithmsThe values of Acc and NMI obtained by KST-DPC are 08434and 07260 respectively Larger values of these benchmarksindicate that the experimental results obtained by KST-DPCare closer to the true results than those obtained by the otherclustering algorithms

All these experimental results show that KST-DPC out-perform the other clustering algorithms It obtained largervalues of Acc and NMI than the other clustering algorithms

5 Conclusion

This study proposed a density peak clustering algorithmbased on the K-nearest neighbors Shannon entropy andtissue-like P systems It uses the K-nearest neighbors andShannon entropy to calculate the density metric This algo-rithm overcomes the shortcoming that DPC has that is to setthe value of the cutoff distance 119889119888 in advance The tissue-likeP system is used to realize the clustering processThe analysisdemonstrates that the overall time taken by KST-DPC is

10 Mathematical Problems in Engineering

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(d) SC

Figure 7 Clustering results of the Jain dataset by the four clustering algorithms

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(d) SC

Figure 8 Clustering results of the Aggregation dataset by the four clustering algorithms

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 4: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

4 Mathematical Problems in Engineering

0 05 1 15 2 25 3 35 4 45 50

01

02

03

04

05

06

07

08

09

1

exp(-N22) =1exp(-N22) =sqrt(2)exp(-t)

Figure 1 Three different function curves

1 2 n

n+1

Figure 2 Membrane structure of a tissue-like P system

KNN(119894) be a set of nearest neighbors of a point 119894 and it can beexpressed as 119870119873119873(119894) = 119895 | 119889119894119895 le 119889119894119896119905ℎ(i) (8)

where 119889(x119894 x119895) is the Euclidean distance between x119894 and x119895and 119896119905ℎ(119894) is the k-th nearest neighbor of 119894 Local regionsmeasured byKNNare often termedK-nearest neighborhoodwhich in fact is a circular or spherical area or radius119877 = 119889119894119896119905ℎ(119894) Therefore KNN-based method cannot applyto handle datasets with clusters nonspherical distributionsTherefore these methods usually have poor clustering resultswhen handling datasets with clusters of different shapes

Shannon entropymeasures the degree ofmolecular activ-ity The more unstable the system is the larger the value ofthe Shannon entropy is and vice versaThe Shannon entropyrepresented by119867(119883) is given by119867(119883) = minus 119873sum

119894=0

119901119894 log2 (119901119894) (9)

where119883 is the set of objects and 119901119894 is the probability of object119894 appearing in119883 When119867(119883) is used tomeasure the distancebetween the clusters the smaller the value of 119867(119883) is thebetter the clustering result isTherefore the Shannon entropy

is introduced to calculate the data point density in the K-nearest neighbormethod so that the final density calculationnot only considers the distance metric but also adds theinfluence of the data point position to the density of the datapoint

However the decision graph is calculated by the productof 120588119894 and 120575119894 A larger value of 120588119894 makes it easier to choosethe best clustering centers Therefore the reciprocal form ofthe Shannon entropy is adopted The metrics for 120588119894 and 120575119894may be inconsistent which directly leads to 120588119894 and 120575119894 playingdifferent roles in the calculation of the decision graph Henceit is necessary to normalize 120588119894 and 120575119894

The specific calculation method is as follows First thelocal density of data point x119894 is calculated1199081015840119894 = 119899sum

119895=1119895 =119894

110038171003817100381710038171003817x119895 minus x11989410038171003817100381710038171003817 (10)

where x119894 and x119895 are data points and 1199081015840119894 is the density of datapoint x119894 Next the density of data point x119894 is normalized andthe normalized density is denoted as 119908119894119908119894 = 1199081015840119894sum119899119894=1 1199081015840119894 (11)

Mathematical Problems in Engineering 5

Finally the density metric which uses the idea of the K-nearest neighbor method is defined as120588119894 = minus 1(1119870)sum119894isin119870119873119873119908119894 log (119908119894) (12)

To guarantee the consistence of the metrics of 120588119894 and 120575119894120575119894 also needs to be normalized

32 Tissue-Like P System with Active Membranes for ImprovedDensity Peak Clustering In the following a tissue-like Psystem with active membranes for density peak clusteringcalled KST-DPC is proposed As mentioned before assumethe dataset with 119899 data points is represented by 119883 =x1 x2 x119899 Before performing any specific calculation ofthe DPC algorithm the Euclidean distance between each pairof data points in the dataset is calculated and the result isstored in the form of a matrix The initial configuration ofthis P system is shown in Figure 3

When the system is initialized the objects x119894 b1b2 b119899 are in membrane 119894 for 1 le 119894 le 119899 and object 120582 isin membrane 119899 + 1 where 120582 means there is no object Firstthe Euclidean distance 119908119894119895 between the data points x119894 and x119895(represented by b119895 for 1 le 119895 le 119899) is calculated with the rule1199031 = [x119894b1b2 sdot sdot sdot b119899]119894 997888rarr [11988911990811989411198941 11988911990811989421198942 sdot sdot sdot 119889119908119894n119894n ]119894 | 1 le 119894 le 119899Note that x119895 for 1 le 119895 le 119899 are expressed as b1b2 sdot sdot sdot b119899The results are stored as the distance matrix also called thedissimilarity matrix119863119899119899

119863119899119899 =(11990811 11990812 sdot sdot sdot 119908111989911990821 11990822 sdot sdot sdot 1199082119899sdot sdot sdot1199081198991 1199081198992 sdot sdot sdot 119908119899119899) (13)

At the beginning there are 119899 + 1 membranes inthe P system After the distances are calculated objects119909119894 1198891199081198941i1 1198891199081198942i2 sdot sdot sdot 119889119908i119899in are placed in membrane 119894 for 1 le 119894 le 119899In the next step the densities of the data points are calculatedby the rule 1199032 = [11988911990811989411198941 11988911990811989421198942 sdot sdot sdot 119889119908119894n119894n ]119894 997888rarr [1199081015840119894 ]119894 | 1 le 119894 le 119899Then the send-in and send-out communication rules are usedto calculate the values of 120588119894 120575119894 and 120574119894 and to put them inmembrane 119894 for 1 le 119894 le 119899 Next according to the sortedresults of 120574119894 for 1 le 119894 le 119899 the number of clusters 119896 can bedeterminedThe rule of the active membranes is used to splitmembrane 119899+1 into 119896membranes as shown in Figure 4The 119896cluster centers are put inmembranes 119899+1 to 119899+119896 respectivelyFinally the remaining data points are divided and each is putinto a membrane with a cluster center that is closest to thedata point Up to this point the clusters are obtained

The main steps of KST-DPC is summarized asin Algorithm 1

33 Time Complexity Analysis of KST-DPC As usual com-putations in the cells in the tissue-like P system can be imple-mented in parallel Because of the parallel implementationthe generation of the dissimilarity matrix uses 119899 computationsteps The generation of the data points densities needs 1

Table 1 Synthetic datasets

Dataset Instances Dimensions ClustersSpiral 312 2 3Compound 7266 2 6Jain 373 2 2Aggregation 788 2 7R15 600 2 15D31 3100 2 31

computation step The calculation of the final density 120588119894 uses119896 computation steps The calculation of 120575119894 needs 119899 steps Thecalculation of 120574119894 uses 1 step 119899 log 119899 steps are used to sort 120574119894for 1 le 119894 le 119899 Finally the final clustering needs 1 morecomputation step Therefore the total time complexity ofKST-DPC is 119899 + 1 + 119896 + 119899 + 1 + 119899 log 119899 + 1 = 119874(119899 log 119899)The time complexity of the DPC-KNN is119874(1198992) As comparedto DPC-KNN KST-DPC reduces the time complexity bytransferring time complexity to space complexity The aboveanalysis demonstrates that the overall time complexity ofKST-DPC is superior to that of DPC-KNN

4 Test and Analysis

41 Data Sources Experiments on six synthetic datasetsand four real-world datasets are carried out to test theperformance of KST-DPC The synthetic datasets are fromhttpcsueffisipudatasets These datasets are commonlyused as benchmarks to test the performance of clusteringalgorithms The real-world datasets used in the experimentsare from the UCI Machine Learning Repository [31] Thesedatasets are chosen to test the ability of KST-DPC in identi-fying clusters having arbitrary shapes without being affectedby noise size or dimensions of the datasets The numbersof features (dimensions) data points (instances) and clustersvary in each of the datasets The details of the synthetic andreal-world datasets are listed in Tables 1 and 2 respectively

The performance of KST-DPC was compared with thoseof the well-known clustering algorithms SC [32] DBSCAN[33] andDPC-KNN [28 34]The codes for SC andDBSCANare provided by their authors The code of DPC is optimizedby using the matrix operation instead of iteration cycle basedon the original code provided by Rodriguez and Laio [3] toreduce running time

The performances of the above clustering algorithmsare measured in clustering quality or Accuracy (Acc) andNormalized Mutual Information (NMI) They are very pop-ular measures for testing the performance of clusteringalgorithmsThe larger the values are the better the results areThe upper bound of these measures is 1

42 Experimental Results on the Synthetic Datasets In thissubsection the performances of KST-DPC DPC-KNNDBSCAN and SC are reported on the six synthetic datasetsThe clustering results by the four clustering algorithms for thesix synthetic datasets are color coded and displayed in two-dimensional spaces as shown in Figures 5ndash10 The results of

6 Mathematical Problems in Engineering

1 2 n

n+1

1 2 n

n+1

x1 b1 b2 middot middot middot bn x2 b1 b2 middot middot middot bn xn b1 b2 middot middot middot bn

Figure 3 The initial configuration of the tissue-like P system

Inputs dataset X parameter KOutput ClustersStep 1 The objects x119894 b1 b2 b119899 are in membrane 119894 for 1 le 119894 le 119899

and object 120582 is in membrane 119899 + 1Step 2 Compute the Euclidean distance matrix 119908119894119895 by the rule1Step 3 Compute the local densities of the data points by the rule2 and

normalize them using (10) and (11)Step 4 Calculate 120588119894 and 120575119894 for data point 119894 using (12) and (4) in every

membrane 119894 respectivelyStep 5 Calculate 120574119894 = 120588119894 times 120575119894 for all 1 le 119894 le 119899 in membrane 119894 and sort them

by descend and select the top K values as the initial cluster center So as todetermine the centers of the clusters

Step 6 Split the membrane 119899 + 1 to K membranes by the division rules which membranes canbe number from 119899 + 1 to 119899 + 119896

Step 7 The119870 clustering centers are put in membranes 119899 + 1 to 119899 + 119896 respectivelyStep 8 Assign each remaining point to the membrane with the nearest cluster centerStep 9 Return the clustering result

Algorithm 1

Table 2 Real-world datasets

Dataset Instances Dimensions ClustersVertebral 310 7 2Seeds 210 7 3Breast cancer 699 10 2Banknotes 1372 5 2

the four clustering algorithms on a dataset are shown as fourparts in a single figure The cluster centers of the KST-DPCand DPC-KNN algorithms are marked in the figures withdifferent colors For DBSCAN it is not meaningful to markthe cluster centers because they are chosen randomly Eachclustering algorithm ran multiple times on each dataset andthe best result of each clustering algorithm is displayed

The performance measures of the four clustering algo-rithms on the six synthetic datasets are reported in Table 3 InTable 3 the column ldquoParrdquo for each algorithm is the number ofparameters the users need to set KST-DPC and DPC-KNNhave only one parameter K which is the number of nearestneighbors to be prespecified In this paper the value of K isdetermined by the percentage of the data points It referencesthemethod in [34] For each dataset we adjust the percentage

of data points in the KNN for the multiple times and findthe optimal percentage that can make the final clusteringreach the best Because we perform more experiments weonly list the best result in Tables 3 and 4 And in order tobe consistent wi other parameters in the table we directlyconvert the percentage of data points into specific K valuesDBSCANhas two input parameters themaximum radiusEpsand the minimum pointMinPts The SC algorithm needs thetrue number of clusters C1 in Table 3 refers to the numberof cluster centers found by the algorithms The performancemeasures including Acc and NMI are presented in Table 3 forthe four clustering algorithms on the six synthetic datasets

The Spiral dataset has 3 clusters with 312 data pointsembracing each other Table 3 and Figure 5 show that KST-DPC DPC-KNN DBSCAN and SC can all find the correctnumber of clusters and get the correct clustering results Allthe benchmark values are 100 reflecting the four algorithmsall performing perfectly well on the Spiral dataset

The Compound dataset has 6 clusters with 399 datapoints From Table 3 and Figure 6 it is obvious that KST-DPC can find the ideal clustering result DBSCAN cannotfind the right clusters whereas DPC-KNN and SC cannot findthe clustering centers Because DPC has a special assignmentstrategy [3] it may assign data points erroneously to clusters

Mathematical Problems in Engineering 7

Table 3 Results on the synthetic datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMISpiral CompoundKST-DPC 16 3 100 100 KST-DPC 217 6 098 095DPC-KNN 20 3 100 100 DPC-KNN 360 6 06466 07663DBSCAN 123 3 100 100 DBSCAN 153 5 08596 09429SC 3 3 100 100 SC 6 6 06015 07622Jain AggregationKST-DPC 4 2 100 100 KST-DPC 40 7 100 100DPC-KNN 8 2 09035 05972 DPC-KNN 40 7 09987 09957DBSCAN 2624 2 100 100 DBSCAN 1593 5 08274 08894SC 2 2 100 100 SC 7 7 09937 09824R15 D31KST-DPC 20 15 100 099 KST-DPC 25 31 10000 10000DPC-KNN 20 15 100 099 DPC-KNN 25 31 09700 09500DBSCAN 045 13 078 09155 DBSCAN 0463 27 06516 08444SC 15 15 09967 09942 SC 31 31 09765 09670

x1 dw1111 d

w1212 middot middot middot d

w1

1H

1 1 1

x2 dw2121 d

w2222 middot middot middot d

w2

2H

2 2 2

xn dw1H1 d

w2H2 middot middot middot d

wHHH

n n n1 2 n

n+1 n+2 n+k

Figure 4 The tissue-like membrane system in the calculation process

once a data point with a higher density is assigned to anincorrect cluster For this reason some data points belongingto cluster 1 are incorrectly assigned to cluster 2 or 3 asshown in Figures 6(b)ndash6(d) DBSCAN has some prespecifiedparameters that can have heavy effects on the clusteringresults As shown in Figure 6(c) two clusters are mergedinto one cluster in two occasions KST-DPC obtained Accand NMI values higher than those obtained by the otheralgorithms

The Jain dataset has two clusters with 373 data points ina 2 dimensional space The clustering results show that KST-DPC DBSCAN and SC can get correct results and both ofthe benchmark values are 100 The experimental results ofthe 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 7 DPC-KNN devides somepoints that should belong to the bottom cluster into the uppercluster Although all the four clustering algorithms can find

the correct number of clusters KST-DPC DBSCAN and SCaremore effective because they can put all the data points intothe correct clusters

The Aggregation dataset has 7 clusters with differentsizes and shapes and two pairs of clusters connected to eachother Figure 8 shows that both the KST-DPC and DPC-KNN algorithms can effectively find the cluster centers andcorrect clusters except that an individual data point is putinto an incorrect cluster byDPC-KNN Table 3 shows that thebenchmark values of KST-DPC are all 100 and those of DPC-KNN are close to 100 SC also can recognize all clusters butthe values ofAcc andNMI are lower than those ofDPC-KNNDBSCAN did not find all clusters and could not partition theclusters connected to each other

TheR15 dataset has 15 clusters containing 600 data pointsThe clusters are slightly overlapping and are distributedrandomly in a 2-dimensional space One cluster lays in the

8 Mathematical Problems in Engineering

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(a) KST-DPC0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(b) DPC-KNN

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(c) DBSCAN0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(d) SC

Figure 5 Clustering results of the Spiral dataset by the four clustering algorithms

center of the 2-dimensional space and is closely surroundedby seven other clusters The experimental results of the 4algorithms are shown in Table 3 and the clustering resultsare displayed in Figure 9 KST-DPC and DPC-KNN can bothfind the correct cluster centers and assign almost all datapoints to their corresponding clusters SC also obtained goodexperimental result but DBSCAN did not find all clusters

The D31 dataset has 31 clusters and contains 3100 datapoints These clusters are slightly overlapping and distributerandomly in a 2-dimensional spaceThe experimental resultsof the 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 10 The values of Acc and NMIobtained by KST-DPC are all 100 This shows that KST-DPCobtained perfect clustering results on the D31 dataset DPCand SC obtained similar results to those of KST-DPC on thisdataset but DBSCAN was not able to find all clusters

43 Experimental Results on the Real-World Datasets Thissubsection reports the performances of the clustering algo-rithms on the four real-world datasets The varying sizesand dimensions of these datasets are useful in testing theperformance of the algorithms under different conditions

The number of clusters Acc and NMI are also usedto measure the performances of the clustering algorithmson these real-world datasets The experimental results are

reported in Table 4 and the best results of the each dataset areshown in italicThe symbol ldquo--rdquo indicates there is no value forthat entry

The Vertebral dataset consists of 2 clusters and 310 datapoints As Table 4 shows the value of Acc got by KST-DPCis equal to that got by DPC-KNN but the value of NMI gotby KST-DPC is lower than that got by DPC-KNN No valuesof Acc and NMI were obtained by SC As Table 4 shows allalgorithms could find the right number of clusters

The Seeds dataset consists of 210 data points and 3clusters Results in Table 4 show that KST-DPC obtained thebest whereas DBSCAN obtained the worst values of Acc andNMI It is obvious that all four clustering algorithms could getthe right number of clusters

The Breast Cancer dataset consists of 699 data points and2 clusters The results on this dataset in Table 4 show thatall four clustering algorithms could find the right numberof clusters KST-DPC obtained the Acc and NMI values of08624 and 04106 respectively which are higher than thoseobtained by other clustering algorithmsThe results also showthat DBSCAN has the worst performance on this datasetexcept that SC did not get experimental results on thesebenchmarks

The Banknotes dataset consists of 1372 data points and 2clusters FromTable 4 it is obvious that KST-DPCgot the best

Mathematical Problems in Engineering 9

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(a) KST-DPC5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(b) DPC-KNN

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(c) DBSCAN5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(d) SC

Figure 6 Clustering results of the Compound dataset by the four clustering algorithms

Table 4 Results on the real-world datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMIVertebral SeedsKST-DPC 9 2 06806 00313 KST-DPC 4 3 08429 06574DPC-KNN 9 2 06806 00821 DPC-KNN 6 3 08143 06252DBSCAN 748 2 06742 -- DBSCAN 0927 3 05857 04835SC 2 2 -- -- SC 3 3 06071 05987Breast cancer BanknotesKST-DPC 70 2 08624 04106 KST-DPC 68 2 08434 07236DPC-KNN 76 2 07954 03154 DPC-KNN 82 2 07340 03311DBSCAN 620 2 06552 00872 DBSCAN 655 2 05554 67210e-16SC 2 2 -- -- SC 2 2 06152 00598

values of Acc and NMI among all four clustering algorithmsThe values of Acc and NMI obtained by KST-DPC are 08434and 07260 respectively Larger values of these benchmarksindicate that the experimental results obtained by KST-DPCare closer to the true results than those obtained by the otherclustering algorithms

All these experimental results show that KST-DPC out-perform the other clustering algorithms It obtained largervalues of Acc and NMI than the other clustering algorithms

5 Conclusion

This study proposed a density peak clustering algorithmbased on the K-nearest neighbors Shannon entropy andtissue-like P systems It uses the K-nearest neighbors andShannon entropy to calculate the density metric This algo-rithm overcomes the shortcoming that DPC has that is to setthe value of the cutoff distance 119889119888 in advance The tissue-likeP system is used to realize the clustering processThe analysisdemonstrates that the overall time taken by KST-DPC is

10 Mathematical Problems in Engineering

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(d) SC

Figure 7 Clustering results of the Jain dataset by the four clustering algorithms

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(d) SC

Figure 8 Clustering results of the Aggregation dataset by the four clustering algorithms

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 5: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

Mathematical Problems in Engineering 5

Finally the density metric which uses the idea of the K-nearest neighbor method is defined as120588119894 = minus 1(1119870)sum119894isin119870119873119873119908119894 log (119908119894) (12)

To guarantee the consistence of the metrics of 120588119894 and 120575119894120575119894 also needs to be normalized

32 Tissue-Like P System with Active Membranes for ImprovedDensity Peak Clustering In the following a tissue-like Psystem with active membranes for density peak clusteringcalled KST-DPC is proposed As mentioned before assumethe dataset with 119899 data points is represented by 119883 =x1 x2 x119899 Before performing any specific calculation ofthe DPC algorithm the Euclidean distance between each pairof data points in the dataset is calculated and the result isstored in the form of a matrix The initial configuration ofthis P system is shown in Figure 3

When the system is initialized the objects x119894 b1b2 b119899 are in membrane 119894 for 1 le 119894 le 119899 and object 120582 isin membrane 119899 + 1 where 120582 means there is no object Firstthe Euclidean distance 119908119894119895 between the data points x119894 and x119895(represented by b119895 for 1 le 119895 le 119899) is calculated with the rule1199031 = [x119894b1b2 sdot sdot sdot b119899]119894 997888rarr [11988911990811989411198941 11988911990811989421198942 sdot sdot sdot 119889119908119894n119894n ]119894 | 1 le 119894 le 119899Note that x119895 for 1 le 119895 le 119899 are expressed as b1b2 sdot sdot sdot b119899The results are stored as the distance matrix also called thedissimilarity matrix119863119899119899

119863119899119899 =(11990811 11990812 sdot sdot sdot 119908111989911990821 11990822 sdot sdot sdot 1199082119899sdot sdot sdot1199081198991 1199081198992 sdot sdot sdot 119908119899119899) (13)

At the beginning there are 119899 + 1 membranes inthe P system After the distances are calculated objects119909119894 1198891199081198941i1 1198891199081198942i2 sdot sdot sdot 119889119908i119899in are placed in membrane 119894 for 1 le 119894 le 119899In the next step the densities of the data points are calculatedby the rule 1199032 = [11988911990811989411198941 11988911990811989421198942 sdot sdot sdot 119889119908119894n119894n ]119894 997888rarr [1199081015840119894 ]119894 | 1 le 119894 le 119899Then the send-in and send-out communication rules are usedto calculate the values of 120588119894 120575119894 and 120574119894 and to put them inmembrane 119894 for 1 le 119894 le 119899 Next according to the sortedresults of 120574119894 for 1 le 119894 le 119899 the number of clusters 119896 can bedeterminedThe rule of the active membranes is used to splitmembrane 119899+1 into 119896membranes as shown in Figure 4The 119896cluster centers are put inmembranes 119899+1 to 119899+119896 respectivelyFinally the remaining data points are divided and each is putinto a membrane with a cluster center that is closest to thedata point Up to this point the clusters are obtained

The main steps of KST-DPC is summarized asin Algorithm 1

33 Time Complexity Analysis of KST-DPC As usual com-putations in the cells in the tissue-like P system can be imple-mented in parallel Because of the parallel implementationthe generation of the dissimilarity matrix uses 119899 computationsteps The generation of the data points densities needs 1

Table 1 Synthetic datasets

Dataset Instances Dimensions ClustersSpiral 312 2 3Compound 7266 2 6Jain 373 2 2Aggregation 788 2 7R15 600 2 15D31 3100 2 31

computation step The calculation of the final density 120588119894 uses119896 computation steps The calculation of 120575119894 needs 119899 steps Thecalculation of 120574119894 uses 1 step 119899 log 119899 steps are used to sort 120574119894for 1 le 119894 le 119899 Finally the final clustering needs 1 morecomputation step Therefore the total time complexity ofKST-DPC is 119899 + 1 + 119896 + 119899 + 1 + 119899 log 119899 + 1 = 119874(119899 log 119899)The time complexity of the DPC-KNN is119874(1198992) As comparedto DPC-KNN KST-DPC reduces the time complexity bytransferring time complexity to space complexity The aboveanalysis demonstrates that the overall time complexity ofKST-DPC is superior to that of DPC-KNN

4 Test and Analysis

41 Data Sources Experiments on six synthetic datasetsand four real-world datasets are carried out to test theperformance of KST-DPC The synthetic datasets are fromhttpcsueffisipudatasets These datasets are commonlyused as benchmarks to test the performance of clusteringalgorithms The real-world datasets used in the experimentsare from the UCI Machine Learning Repository [31] Thesedatasets are chosen to test the ability of KST-DPC in identi-fying clusters having arbitrary shapes without being affectedby noise size or dimensions of the datasets The numbersof features (dimensions) data points (instances) and clustersvary in each of the datasets The details of the synthetic andreal-world datasets are listed in Tables 1 and 2 respectively

The performance of KST-DPC was compared with thoseof the well-known clustering algorithms SC [32] DBSCAN[33] andDPC-KNN [28 34]The codes for SC andDBSCANare provided by their authors The code of DPC is optimizedby using the matrix operation instead of iteration cycle basedon the original code provided by Rodriguez and Laio [3] toreduce running time

The performances of the above clustering algorithmsare measured in clustering quality or Accuracy (Acc) andNormalized Mutual Information (NMI) They are very pop-ular measures for testing the performance of clusteringalgorithmsThe larger the values are the better the results areThe upper bound of these measures is 1

42 Experimental Results on the Synthetic Datasets In thissubsection the performances of KST-DPC DPC-KNNDBSCAN and SC are reported on the six synthetic datasetsThe clustering results by the four clustering algorithms for thesix synthetic datasets are color coded and displayed in two-dimensional spaces as shown in Figures 5ndash10 The results of

6 Mathematical Problems in Engineering

1 2 n

n+1

1 2 n

n+1

x1 b1 b2 middot middot middot bn x2 b1 b2 middot middot middot bn xn b1 b2 middot middot middot bn

Figure 3 The initial configuration of the tissue-like P system

Inputs dataset X parameter KOutput ClustersStep 1 The objects x119894 b1 b2 b119899 are in membrane 119894 for 1 le 119894 le 119899

and object 120582 is in membrane 119899 + 1Step 2 Compute the Euclidean distance matrix 119908119894119895 by the rule1Step 3 Compute the local densities of the data points by the rule2 and

normalize them using (10) and (11)Step 4 Calculate 120588119894 and 120575119894 for data point 119894 using (12) and (4) in every

membrane 119894 respectivelyStep 5 Calculate 120574119894 = 120588119894 times 120575119894 for all 1 le 119894 le 119899 in membrane 119894 and sort them

by descend and select the top K values as the initial cluster center So as todetermine the centers of the clusters

Step 6 Split the membrane 119899 + 1 to K membranes by the division rules which membranes canbe number from 119899 + 1 to 119899 + 119896

Step 7 The119870 clustering centers are put in membranes 119899 + 1 to 119899 + 119896 respectivelyStep 8 Assign each remaining point to the membrane with the nearest cluster centerStep 9 Return the clustering result

Algorithm 1

Table 2 Real-world datasets

Dataset Instances Dimensions ClustersVertebral 310 7 2Seeds 210 7 3Breast cancer 699 10 2Banknotes 1372 5 2

the four clustering algorithms on a dataset are shown as fourparts in a single figure The cluster centers of the KST-DPCand DPC-KNN algorithms are marked in the figures withdifferent colors For DBSCAN it is not meaningful to markthe cluster centers because they are chosen randomly Eachclustering algorithm ran multiple times on each dataset andthe best result of each clustering algorithm is displayed

The performance measures of the four clustering algo-rithms on the six synthetic datasets are reported in Table 3 InTable 3 the column ldquoParrdquo for each algorithm is the number ofparameters the users need to set KST-DPC and DPC-KNNhave only one parameter K which is the number of nearestneighbors to be prespecified In this paper the value of K isdetermined by the percentage of the data points It referencesthemethod in [34] For each dataset we adjust the percentage

of data points in the KNN for the multiple times and findthe optimal percentage that can make the final clusteringreach the best Because we perform more experiments weonly list the best result in Tables 3 and 4 And in order tobe consistent wi other parameters in the table we directlyconvert the percentage of data points into specific K valuesDBSCANhas two input parameters themaximum radiusEpsand the minimum pointMinPts The SC algorithm needs thetrue number of clusters C1 in Table 3 refers to the numberof cluster centers found by the algorithms The performancemeasures including Acc and NMI are presented in Table 3 forthe four clustering algorithms on the six synthetic datasets

The Spiral dataset has 3 clusters with 312 data pointsembracing each other Table 3 and Figure 5 show that KST-DPC DPC-KNN DBSCAN and SC can all find the correctnumber of clusters and get the correct clustering results Allthe benchmark values are 100 reflecting the four algorithmsall performing perfectly well on the Spiral dataset

The Compound dataset has 6 clusters with 399 datapoints From Table 3 and Figure 6 it is obvious that KST-DPC can find the ideal clustering result DBSCAN cannotfind the right clusters whereas DPC-KNN and SC cannot findthe clustering centers Because DPC has a special assignmentstrategy [3] it may assign data points erroneously to clusters

Mathematical Problems in Engineering 7

Table 3 Results on the synthetic datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMISpiral CompoundKST-DPC 16 3 100 100 KST-DPC 217 6 098 095DPC-KNN 20 3 100 100 DPC-KNN 360 6 06466 07663DBSCAN 123 3 100 100 DBSCAN 153 5 08596 09429SC 3 3 100 100 SC 6 6 06015 07622Jain AggregationKST-DPC 4 2 100 100 KST-DPC 40 7 100 100DPC-KNN 8 2 09035 05972 DPC-KNN 40 7 09987 09957DBSCAN 2624 2 100 100 DBSCAN 1593 5 08274 08894SC 2 2 100 100 SC 7 7 09937 09824R15 D31KST-DPC 20 15 100 099 KST-DPC 25 31 10000 10000DPC-KNN 20 15 100 099 DPC-KNN 25 31 09700 09500DBSCAN 045 13 078 09155 DBSCAN 0463 27 06516 08444SC 15 15 09967 09942 SC 31 31 09765 09670

x1 dw1111 d

w1212 middot middot middot d

w1

1H

1 1 1

x2 dw2121 d

w2222 middot middot middot d

w2

2H

2 2 2

xn dw1H1 d

w2H2 middot middot middot d

wHHH

n n n1 2 n

n+1 n+2 n+k

Figure 4 The tissue-like membrane system in the calculation process

once a data point with a higher density is assigned to anincorrect cluster For this reason some data points belongingto cluster 1 are incorrectly assigned to cluster 2 or 3 asshown in Figures 6(b)ndash6(d) DBSCAN has some prespecifiedparameters that can have heavy effects on the clusteringresults As shown in Figure 6(c) two clusters are mergedinto one cluster in two occasions KST-DPC obtained Accand NMI values higher than those obtained by the otheralgorithms

The Jain dataset has two clusters with 373 data points ina 2 dimensional space The clustering results show that KST-DPC DBSCAN and SC can get correct results and both ofthe benchmark values are 100 The experimental results ofthe 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 7 DPC-KNN devides somepoints that should belong to the bottom cluster into the uppercluster Although all the four clustering algorithms can find

the correct number of clusters KST-DPC DBSCAN and SCaremore effective because they can put all the data points intothe correct clusters

The Aggregation dataset has 7 clusters with differentsizes and shapes and two pairs of clusters connected to eachother Figure 8 shows that both the KST-DPC and DPC-KNN algorithms can effectively find the cluster centers andcorrect clusters except that an individual data point is putinto an incorrect cluster byDPC-KNN Table 3 shows that thebenchmark values of KST-DPC are all 100 and those of DPC-KNN are close to 100 SC also can recognize all clusters butthe values ofAcc andNMI are lower than those ofDPC-KNNDBSCAN did not find all clusters and could not partition theclusters connected to each other

TheR15 dataset has 15 clusters containing 600 data pointsThe clusters are slightly overlapping and are distributedrandomly in a 2-dimensional space One cluster lays in the

8 Mathematical Problems in Engineering

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(a) KST-DPC0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(b) DPC-KNN

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(c) DBSCAN0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(d) SC

Figure 5 Clustering results of the Spiral dataset by the four clustering algorithms

center of the 2-dimensional space and is closely surroundedby seven other clusters The experimental results of the 4algorithms are shown in Table 3 and the clustering resultsare displayed in Figure 9 KST-DPC and DPC-KNN can bothfind the correct cluster centers and assign almost all datapoints to their corresponding clusters SC also obtained goodexperimental result but DBSCAN did not find all clusters

The D31 dataset has 31 clusters and contains 3100 datapoints These clusters are slightly overlapping and distributerandomly in a 2-dimensional spaceThe experimental resultsof the 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 10 The values of Acc and NMIobtained by KST-DPC are all 100 This shows that KST-DPCobtained perfect clustering results on the D31 dataset DPCand SC obtained similar results to those of KST-DPC on thisdataset but DBSCAN was not able to find all clusters

43 Experimental Results on the Real-World Datasets Thissubsection reports the performances of the clustering algo-rithms on the four real-world datasets The varying sizesand dimensions of these datasets are useful in testing theperformance of the algorithms under different conditions

The number of clusters Acc and NMI are also usedto measure the performances of the clustering algorithmson these real-world datasets The experimental results are

reported in Table 4 and the best results of the each dataset areshown in italicThe symbol ldquo--rdquo indicates there is no value forthat entry

The Vertebral dataset consists of 2 clusters and 310 datapoints As Table 4 shows the value of Acc got by KST-DPCis equal to that got by DPC-KNN but the value of NMI gotby KST-DPC is lower than that got by DPC-KNN No valuesof Acc and NMI were obtained by SC As Table 4 shows allalgorithms could find the right number of clusters

The Seeds dataset consists of 210 data points and 3clusters Results in Table 4 show that KST-DPC obtained thebest whereas DBSCAN obtained the worst values of Acc andNMI It is obvious that all four clustering algorithms could getthe right number of clusters

The Breast Cancer dataset consists of 699 data points and2 clusters The results on this dataset in Table 4 show thatall four clustering algorithms could find the right numberof clusters KST-DPC obtained the Acc and NMI values of08624 and 04106 respectively which are higher than thoseobtained by other clustering algorithmsThe results also showthat DBSCAN has the worst performance on this datasetexcept that SC did not get experimental results on thesebenchmarks

The Banknotes dataset consists of 1372 data points and 2clusters FromTable 4 it is obvious that KST-DPCgot the best

Mathematical Problems in Engineering 9

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(a) KST-DPC5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(b) DPC-KNN

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(c) DBSCAN5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(d) SC

Figure 6 Clustering results of the Compound dataset by the four clustering algorithms

Table 4 Results on the real-world datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMIVertebral SeedsKST-DPC 9 2 06806 00313 KST-DPC 4 3 08429 06574DPC-KNN 9 2 06806 00821 DPC-KNN 6 3 08143 06252DBSCAN 748 2 06742 -- DBSCAN 0927 3 05857 04835SC 2 2 -- -- SC 3 3 06071 05987Breast cancer BanknotesKST-DPC 70 2 08624 04106 KST-DPC 68 2 08434 07236DPC-KNN 76 2 07954 03154 DPC-KNN 82 2 07340 03311DBSCAN 620 2 06552 00872 DBSCAN 655 2 05554 67210e-16SC 2 2 -- -- SC 2 2 06152 00598

values of Acc and NMI among all four clustering algorithmsThe values of Acc and NMI obtained by KST-DPC are 08434and 07260 respectively Larger values of these benchmarksindicate that the experimental results obtained by KST-DPCare closer to the true results than those obtained by the otherclustering algorithms

All these experimental results show that KST-DPC out-perform the other clustering algorithms It obtained largervalues of Acc and NMI than the other clustering algorithms

5 Conclusion

This study proposed a density peak clustering algorithmbased on the K-nearest neighbors Shannon entropy andtissue-like P systems It uses the K-nearest neighbors andShannon entropy to calculate the density metric This algo-rithm overcomes the shortcoming that DPC has that is to setthe value of the cutoff distance 119889119888 in advance The tissue-likeP system is used to realize the clustering processThe analysisdemonstrates that the overall time taken by KST-DPC is

10 Mathematical Problems in Engineering

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(d) SC

Figure 7 Clustering results of the Jain dataset by the four clustering algorithms

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(d) SC

Figure 8 Clustering results of the Aggregation dataset by the four clustering algorithms

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 6: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

6 Mathematical Problems in Engineering

1 2 n

n+1

1 2 n

n+1

x1 b1 b2 middot middot middot bn x2 b1 b2 middot middot middot bn xn b1 b2 middot middot middot bn

Figure 3 The initial configuration of the tissue-like P system

Inputs dataset X parameter KOutput ClustersStep 1 The objects x119894 b1 b2 b119899 are in membrane 119894 for 1 le 119894 le 119899

and object 120582 is in membrane 119899 + 1Step 2 Compute the Euclidean distance matrix 119908119894119895 by the rule1Step 3 Compute the local densities of the data points by the rule2 and

normalize them using (10) and (11)Step 4 Calculate 120588119894 and 120575119894 for data point 119894 using (12) and (4) in every

membrane 119894 respectivelyStep 5 Calculate 120574119894 = 120588119894 times 120575119894 for all 1 le 119894 le 119899 in membrane 119894 and sort them

by descend and select the top K values as the initial cluster center So as todetermine the centers of the clusters

Step 6 Split the membrane 119899 + 1 to K membranes by the division rules which membranes canbe number from 119899 + 1 to 119899 + 119896

Step 7 The119870 clustering centers are put in membranes 119899 + 1 to 119899 + 119896 respectivelyStep 8 Assign each remaining point to the membrane with the nearest cluster centerStep 9 Return the clustering result

Algorithm 1

Table 2 Real-world datasets

Dataset Instances Dimensions ClustersVertebral 310 7 2Seeds 210 7 3Breast cancer 699 10 2Banknotes 1372 5 2

the four clustering algorithms on a dataset are shown as fourparts in a single figure The cluster centers of the KST-DPCand DPC-KNN algorithms are marked in the figures withdifferent colors For DBSCAN it is not meaningful to markthe cluster centers because they are chosen randomly Eachclustering algorithm ran multiple times on each dataset andthe best result of each clustering algorithm is displayed

The performance measures of the four clustering algo-rithms on the six synthetic datasets are reported in Table 3 InTable 3 the column ldquoParrdquo for each algorithm is the number ofparameters the users need to set KST-DPC and DPC-KNNhave only one parameter K which is the number of nearestneighbors to be prespecified In this paper the value of K isdetermined by the percentage of the data points It referencesthemethod in [34] For each dataset we adjust the percentage

of data points in the KNN for the multiple times and findthe optimal percentage that can make the final clusteringreach the best Because we perform more experiments weonly list the best result in Tables 3 and 4 And in order tobe consistent wi other parameters in the table we directlyconvert the percentage of data points into specific K valuesDBSCANhas two input parameters themaximum radiusEpsand the minimum pointMinPts The SC algorithm needs thetrue number of clusters C1 in Table 3 refers to the numberof cluster centers found by the algorithms The performancemeasures including Acc and NMI are presented in Table 3 forthe four clustering algorithms on the six synthetic datasets

The Spiral dataset has 3 clusters with 312 data pointsembracing each other Table 3 and Figure 5 show that KST-DPC DPC-KNN DBSCAN and SC can all find the correctnumber of clusters and get the correct clustering results Allthe benchmark values are 100 reflecting the four algorithmsall performing perfectly well on the Spiral dataset

The Compound dataset has 6 clusters with 399 datapoints From Table 3 and Figure 6 it is obvious that KST-DPC can find the ideal clustering result DBSCAN cannotfind the right clusters whereas DPC-KNN and SC cannot findthe clustering centers Because DPC has a special assignmentstrategy [3] it may assign data points erroneously to clusters

Mathematical Problems in Engineering 7

Table 3 Results on the synthetic datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMISpiral CompoundKST-DPC 16 3 100 100 KST-DPC 217 6 098 095DPC-KNN 20 3 100 100 DPC-KNN 360 6 06466 07663DBSCAN 123 3 100 100 DBSCAN 153 5 08596 09429SC 3 3 100 100 SC 6 6 06015 07622Jain AggregationKST-DPC 4 2 100 100 KST-DPC 40 7 100 100DPC-KNN 8 2 09035 05972 DPC-KNN 40 7 09987 09957DBSCAN 2624 2 100 100 DBSCAN 1593 5 08274 08894SC 2 2 100 100 SC 7 7 09937 09824R15 D31KST-DPC 20 15 100 099 KST-DPC 25 31 10000 10000DPC-KNN 20 15 100 099 DPC-KNN 25 31 09700 09500DBSCAN 045 13 078 09155 DBSCAN 0463 27 06516 08444SC 15 15 09967 09942 SC 31 31 09765 09670

x1 dw1111 d

w1212 middot middot middot d

w1

1H

1 1 1

x2 dw2121 d

w2222 middot middot middot d

w2

2H

2 2 2

xn dw1H1 d

w2H2 middot middot middot d

wHHH

n n n1 2 n

n+1 n+2 n+k

Figure 4 The tissue-like membrane system in the calculation process

once a data point with a higher density is assigned to anincorrect cluster For this reason some data points belongingto cluster 1 are incorrectly assigned to cluster 2 or 3 asshown in Figures 6(b)ndash6(d) DBSCAN has some prespecifiedparameters that can have heavy effects on the clusteringresults As shown in Figure 6(c) two clusters are mergedinto one cluster in two occasions KST-DPC obtained Accand NMI values higher than those obtained by the otheralgorithms

The Jain dataset has two clusters with 373 data points ina 2 dimensional space The clustering results show that KST-DPC DBSCAN and SC can get correct results and both ofthe benchmark values are 100 The experimental results ofthe 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 7 DPC-KNN devides somepoints that should belong to the bottom cluster into the uppercluster Although all the four clustering algorithms can find

the correct number of clusters KST-DPC DBSCAN and SCaremore effective because they can put all the data points intothe correct clusters

The Aggregation dataset has 7 clusters with differentsizes and shapes and two pairs of clusters connected to eachother Figure 8 shows that both the KST-DPC and DPC-KNN algorithms can effectively find the cluster centers andcorrect clusters except that an individual data point is putinto an incorrect cluster byDPC-KNN Table 3 shows that thebenchmark values of KST-DPC are all 100 and those of DPC-KNN are close to 100 SC also can recognize all clusters butthe values ofAcc andNMI are lower than those ofDPC-KNNDBSCAN did not find all clusters and could not partition theclusters connected to each other

TheR15 dataset has 15 clusters containing 600 data pointsThe clusters are slightly overlapping and are distributedrandomly in a 2-dimensional space One cluster lays in the

8 Mathematical Problems in Engineering

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(a) KST-DPC0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(b) DPC-KNN

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(c) DBSCAN0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(d) SC

Figure 5 Clustering results of the Spiral dataset by the four clustering algorithms

center of the 2-dimensional space and is closely surroundedby seven other clusters The experimental results of the 4algorithms are shown in Table 3 and the clustering resultsare displayed in Figure 9 KST-DPC and DPC-KNN can bothfind the correct cluster centers and assign almost all datapoints to their corresponding clusters SC also obtained goodexperimental result but DBSCAN did not find all clusters

The D31 dataset has 31 clusters and contains 3100 datapoints These clusters are slightly overlapping and distributerandomly in a 2-dimensional spaceThe experimental resultsof the 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 10 The values of Acc and NMIobtained by KST-DPC are all 100 This shows that KST-DPCobtained perfect clustering results on the D31 dataset DPCand SC obtained similar results to those of KST-DPC on thisdataset but DBSCAN was not able to find all clusters

43 Experimental Results on the Real-World Datasets Thissubsection reports the performances of the clustering algo-rithms on the four real-world datasets The varying sizesand dimensions of these datasets are useful in testing theperformance of the algorithms under different conditions

The number of clusters Acc and NMI are also usedto measure the performances of the clustering algorithmson these real-world datasets The experimental results are

reported in Table 4 and the best results of the each dataset areshown in italicThe symbol ldquo--rdquo indicates there is no value forthat entry

The Vertebral dataset consists of 2 clusters and 310 datapoints As Table 4 shows the value of Acc got by KST-DPCis equal to that got by DPC-KNN but the value of NMI gotby KST-DPC is lower than that got by DPC-KNN No valuesof Acc and NMI were obtained by SC As Table 4 shows allalgorithms could find the right number of clusters

The Seeds dataset consists of 210 data points and 3clusters Results in Table 4 show that KST-DPC obtained thebest whereas DBSCAN obtained the worst values of Acc andNMI It is obvious that all four clustering algorithms could getthe right number of clusters

The Breast Cancer dataset consists of 699 data points and2 clusters The results on this dataset in Table 4 show thatall four clustering algorithms could find the right numberof clusters KST-DPC obtained the Acc and NMI values of08624 and 04106 respectively which are higher than thoseobtained by other clustering algorithmsThe results also showthat DBSCAN has the worst performance on this datasetexcept that SC did not get experimental results on thesebenchmarks

The Banknotes dataset consists of 1372 data points and 2clusters FromTable 4 it is obvious that KST-DPCgot the best

Mathematical Problems in Engineering 9

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(a) KST-DPC5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(b) DPC-KNN

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(c) DBSCAN5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(d) SC

Figure 6 Clustering results of the Compound dataset by the four clustering algorithms

Table 4 Results on the real-world datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMIVertebral SeedsKST-DPC 9 2 06806 00313 KST-DPC 4 3 08429 06574DPC-KNN 9 2 06806 00821 DPC-KNN 6 3 08143 06252DBSCAN 748 2 06742 -- DBSCAN 0927 3 05857 04835SC 2 2 -- -- SC 3 3 06071 05987Breast cancer BanknotesKST-DPC 70 2 08624 04106 KST-DPC 68 2 08434 07236DPC-KNN 76 2 07954 03154 DPC-KNN 82 2 07340 03311DBSCAN 620 2 06552 00872 DBSCAN 655 2 05554 67210e-16SC 2 2 -- -- SC 2 2 06152 00598

values of Acc and NMI among all four clustering algorithmsThe values of Acc and NMI obtained by KST-DPC are 08434and 07260 respectively Larger values of these benchmarksindicate that the experimental results obtained by KST-DPCare closer to the true results than those obtained by the otherclustering algorithms

All these experimental results show that KST-DPC out-perform the other clustering algorithms It obtained largervalues of Acc and NMI than the other clustering algorithms

5 Conclusion

This study proposed a density peak clustering algorithmbased on the K-nearest neighbors Shannon entropy andtissue-like P systems It uses the K-nearest neighbors andShannon entropy to calculate the density metric This algo-rithm overcomes the shortcoming that DPC has that is to setthe value of the cutoff distance 119889119888 in advance The tissue-likeP system is used to realize the clustering processThe analysisdemonstrates that the overall time taken by KST-DPC is

10 Mathematical Problems in Engineering

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(d) SC

Figure 7 Clustering results of the Jain dataset by the four clustering algorithms

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(d) SC

Figure 8 Clustering results of the Aggregation dataset by the four clustering algorithms

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 7: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

Mathematical Problems in Engineering 7

Table 3 Results on the synthetic datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMISpiral CompoundKST-DPC 16 3 100 100 KST-DPC 217 6 098 095DPC-KNN 20 3 100 100 DPC-KNN 360 6 06466 07663DBSCAN 123 3 100 100 DBSCAN 153 5 08596 09429SC 3 3 100 100 SC 6 6 06015 07622Jain AggregationKST-DPC 4 2 100 100 KST-DPC 40 7 100 100DPC-KNN 8 2 09035 05972 DPC-KNN 40 7 09987 09957DBSCAN 2624 2 100 100 DBSCAN 1593 5 08274 08894SC 2 2 100 100 SC 7 7 09937 09824R15 D31KST-DPC 20 15 100 099 KST-DPC 25 31 10000 10000DPC-KNN 20 15 100 099 DPC-KNN 25 31 09700 09500DBSCAN 045 13 078 09155 DBSCAN 0463 27 06516 08444SC 15 15 09967 09942 SC 31 31 09765 09670

x1 dw1111 d

w1212 middot middot middot d

w1

1H

1 1 1

x2 dw2121 d

w2222 middot middot middot d

w2

2H

2 2 2

xn dw1H1 d

w2H2 middot middot middot d

wHHH

n n n1 2 n

n+1 n+2 n+k

Figure 4 The tissue-like membrane system in the calculation process

once a data point with a higher density is assigned to anincorrect cluster For this reason some data points belongingto cluster 1 are incorrectly assigned to cluster 2 or 3 asshown in Figures 6(b)ndash6(d) DBSCAN has some prespecifiedparameters that can have heavy effects on the clusteringresults As shown in Figure 6(c) two clusters are mergedinto one cluster in two occasions KST-DPC obtained Accand NMI values higher than those obtained by the otheralgorithms

The Jain dataset has two clusters with 373 data points ina 2 dimensional space The clustering results show that KST-DPC DBSCAN and SC can get correct results and both ofthe benchmark values are 100 The experimental results ofthe 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 7 DPC-KNN devides somepoints that should belong to the bottom cluster into the uppercluster Although all the four clustering algorithms can find

the correct number of clusters KST-DPC DBSCAN and SCaremore effective because they can put all the data points intothe correct clusters

The Aggregation dataset has 7 clusters with differentsizes and shapes and two pairs of clusters connected to eachother Figure 8 shows that both the KST-DPC and DPC-KNN algorithms can effectively find the cluster centers andcorrect clusters except that an individual data point is putinto an incorrect cluster byDPC-KNN Table 3 shows that thebenchmark values of KST-DPC are all 100 and those of DPC-KNN are close to 100 SC also can recognize all clusters butthe values ofAcc andNMI are lower than those ofDPC-KNNDBSCAN did not find all clusters and could not partition theclusters connected to each other

TheR15 dataset has 15 clusters containing 600 data pointsThe clusters are slightly overlapping and are distributedrandomly in a 2-dimensional space One cluster lays in the

8 Mathematical Problems in Engineering

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(a) KST-DPC0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(b) DPC-KNN

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(c) DBSCAN0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(d) SC

Figure 5 Clustering results of the Spiral dataset by the four clustering algorithms

center of the 2-dimensional space and is closely surroundedby seven other clusters The experimental results of the 4algorithms are shown in Table 3 and the clustering resultsare displayed in Figure 9 KST-DPC and DPC-KNN can bothfind the correct cluster centers and assign almost all datapoints to their corresponding clusters SC also obtained goodexperimental result but DBSCAN did not find all clusters

The D31 dataset has 31 clusters and contains 3100 datapoints These clusters are slightly overlapping and distributerandomly in a 2-dimensional spaceThe experimental resultsof the 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 10 The values of Acc and NMIobtained by KST-DPC are all 100 This shows that KST-DPCobtained perfect clustering results on the D31 dataset DPCand SC obtained similar results to those of KST-DPC on thisdataset but DBSCAN was not able to find all clusters

43 Experimental Results on the Real-World Datasets Thissubsection reports the performances of the clustering algo-rithms on the four real-world datasets The varying sizesand dimensions of these datasets are useful in testing theperformance of the algorithms under different conditions

The number of clusters Acc and NMI are also usedto measure the performances of the clustering algorithmson these real-world datasets The experimental results are

reported in Table 4 and the best results of the each dataset areshown in italicThe symbol ldquo--rdquo indicates there is no value forthat entry

The Vertebral dataset consists of 2 clusters and 310 datapoints As Table 4 shows the value of Acc got by KST-DPCis equal to that got by DPC-KNN but the value of NMI gotby KST-DPC is lower than that got by DPC-KNN No valuesof Acc and NMI were obtained by SC As Table 4 shows allalgorithms could find the right number of clusters

The Seeds dataset consists of 210 data points and 3clusters Results in Table 4 show that KST-DPC obtained thebest whereas DBSCAN obtained the worst values of Acc andNMI It is obvious that all four clustering algorithms could getthe right number of clusters

The Breast Cancer dataset consists of 699 data points and2 clusters The results on this dataset in Table 4 show thatall four clustering algorithms could find the right numberof clusters KST-DPC obtained the Acc and NMI values of08624 and 04106 respectively which are higher than thoseobtained by other clustering algorithmsThe results also showthat DBSCAN has the worst performance on this datasetexcept that SC did not get experimental results on thesebenchmarks

The Banknotes dataset consists of 1372 data points and 2clusters FromTable 4 it is obvious that KST-DPCgot the best

Mathematical Problems in Engineering 9

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(a) KST-DPC5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(b) DPC-KNN

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(c) DBSCAN5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(d) SC

Figure 6 Clustering results of the Compound dataset by the four clustering algorithms

Table 4 Results on the real-world datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMIVertebral SeedsKST-DPC 9 2 06806 00313 KST-DPC 4 3 08429 06574DPC-KNN 9 2 06806 00821 DPC-KNN 6 3 08143 06252DBSCAN 748 2 06742 -- DBSCAN 0927 3 05857 04835SC 2 2 -- -- SC 3 3 06071 05987Breast cancer BanknotesKST-DPC 70 2 08624 04106 KST-DPC 68 2 08434 07236DPC-KNN 76 2 07954 03154 DPC-KNN 82 2 07340 03311DBSCAN 620 2 06552 00872 DBSCAN 655 2 05554 67210e-16SC 2 2 -- -- SC 2 2 06152 00598

values of Acc and NMI among all four clustering algorithmsThe values of Acc and NMI obtained by KST-DPC are 08434and 07260 respectively Larger values of these benchmarksindicate that the experimental results obtained by KST-DPCare closer to the true results than those obtained by the otherclustering algorithms

All these experimental results show that KST-DPC out-perform the other clustering algorithms It obtained largervalues of Acc and NMI than the other clustering algorithms

5 Conclusion

This study proposed a density peak clustering algorithmbased on the K-nearest neighbors Shannon entropy andtissue-like P systems It uses the K-nearest neighbors andShannon entropy to calculate the density metric This algo-rithm overcomes the shortcoming that DPC has that is to setthe value of the cutoff distance 119889119888 in advance The tissue-likeP system is used to realize the clustering processThe analysisdemonstrates that the overall time taken by KST-DPC is

10 Mathematical Problems in Engineering

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(d) SC

Figure 7 Clustering results of the Jain dataset by the four clustering algorithms

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(d) SC

Figure 8 Clustering results of the Aggregation dataset by the four clustering algorithms

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 8: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

8 Mathematical Problems in Engineering

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(a) KST-DPC0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(b) DPC-KNN

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(c) DBSCAN0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

(d) SC

Figure 5 Clustering results of the Spiral dataset by the four clustering algorithms

center of the 2-dimensional space and is closely surroundedby seven other clusters The experimental results of the 4algorithms are shown in Table 3 and the clustering resultsare displayed in Figure 9 KST-DPC and DPC-KNN can bothfind the correct cluster centers and assign almost all datapoints to their corresponding clusters SC also obtained goodexperimental result but DBSCAN did not find all clusters

The D31 dataset has 31 clusters and contains 3100 datapoints These clusters are slightly overlapping and distributerandomly in a 2-dimensional spaceThe experimental resultsof the 4 algorithms are shown in Table 3 and the clusteringresults are displayed in Figure 10 The values of Acc and NMIobtained by KST-DPC are all 100 This shows that KST-DPCobtained perfect clustering results on the D31 dataset DPCand SC obtained similar results to those of KST-DPC on thisdataset but DBSCAN was not able to find all clusters

43 Experimental Results on the Real-World Datasets Thissubsection reports the performances of the clustering algo-rithms on the four real-world datasets The varying sizesand dimensions of these datasets are useful in testing theperformance of the algorithms under different conditions

The number of clusters Acc and NMI are also usedto measure the performances of the clustering algorithmson these real-world datasets The experimental results are

reported in Table 4 and the best results of the each dataset areshown in italicThe symbol ldquo--rdquo indicates there is no value forthat entry

The Vertebral dataset consists of 2 clusters and 310 datapoints As Table 4 shows the value of Acc got by KST-DPCis equal to that got by DPC-KNN but the value of NMI gotby KST-DPC is lower than that got by DPC-KNN No valuesof Acc and NMI were obtained by SC As Table 4 shows allalgorithms could find the right number of clusters

The Seeds dataset consists of 210 data points and 3clusters Results in Table 4 show that KST-DPC obtained thebest whereas DBSCAN obtained the worst values of Acc andNMI It is obvious that all four clustering algorithms could getthe right number of clusters

The Breast Cancer dataset consists of 699 data points and2 clusters The results on this dataset in Table 4 show thatall four clustering algorithms could find the right numberof clusters KST-DPC obtained the Acc and NMI values of08624 and 04106 respectively which are higher than thoseobtained by other clustering algorithmsThe results also showthat DBSCAN has the worst performance on this datasetexcept that SC did not get experimental results on thesebenchmarks

The Banknotes dataset consists of 1372 data points and 2clusters FromTable 4 it is obvious that KST-DPCgot the best

Mathematical Problems in Engineering 9

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(a) KST-DPC5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(b) DPC-KNN

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(c) DBSCAN5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(d) SC

Figure 6 Clustering results of the Compound dataset by the four clustering algorithms

Table 4 Results on the real-world datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMIVertebral SeedsKST-DPC 9 2 06806 00313 KST-DPC 4 3 08429 06574DPC-KNN 9 2 06806 00821 DPC-KNN 6 3 08143 06252DBSCAN 748 2 06742 -- DBSCAN 0927 3 05857 04835SC 2 2 -- -- SC 3 3 06071 05987Breast cancer BanknotesKST-DPC 70 2 08624 04106 KST-DPC 68 2 08434 07236DPC-KNN 76 2 07954 03154 DPC-KNN 82 2 07340 03311DBSCAN 620 2 06552 00872 DBSCAN 655 2 05554 67210e-16SC 2 2 -- -- SC 2 2 06152 00598

values of Acc and NMI among all four clustering algorithmsThe values of Acc and NMI obtained by KST-DPC are 08434and 07260 respectively Larger values of these benchmarksindicate that the experimental results obtained by KST-DPCare closer to the true results than those obtained by the otherclustering algorithms

All these experimental results show that KST-DPC out-perform the other clustering algorithms It obtained largervalues of Acc and NMI than the other clustering algorithms

5 Conclusion

This study proposed a density peak clustering algorithmbased on the K-nearest neighbors Shannon entropy andtissue-like P systems It uses the K-nearest neighbors andShannon entropy to calculate the density metric This algo-rithm overcomes the shortcoming that DPC has that is to setthe value of the cutoff distance 119889119888 in advance The tissue-likeP system is used to realize the clustering processThe analysisdemonstrates that the overall time taken by KST-DPC is

10 Mathematical Problems in Engineering

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(d) SC

Figure 7 Clustering results of the Jain dataset by the four clustering algorithms

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(d) SC

Figure 8 Clustering results of the Aggregation dataset by the four clustering algorithms

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 9: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

Mathematical Problems in Engineering 9

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(a) KST-DPC5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(b) DPC-KNN

5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(c) DBSCAN5 10 15 20 25 30 35 40 454

6

8

10

12

14

16

18

20

22

24

(d) SC

Figure 6 Clustering results of the Compound dataset by the four clustering algorithms

Table 4 Results on the real-world datasets

Algorithm Par C1 Acc NMI Algorithm Par C1 Acc NMIVertebral SeedsKST-DPC 9 2 06806 00313 KST-DPC 4 3 08429 06574DPC-KNN 9 2 06806 00821 DPC-KNN 6 3 08143 06252DBSCAN 748 2 06742 -- DBSCAN 0927 3 05857 04835SC 2 2 -- -- SC 3 3 06071 05987Breast cancer BanknotesKST-DPC 70 2 08624 04106 KST-DPC 68 2 08434 07236DPC-KNN 76 2 07954 03154 DPC-KNN 82 2 07340 03311DBSCAN 620 2 06552 00872 DBSCAN 655 2 05554 67210e-16SC 2 2 -- -- SC 2 2 06152 00598

values of Acc and NMI among all four clustering algorithmsThe values of Acc and NMI obtained by KST-DPC are 08434and 07260 respectively Larger values of these benchmarksindicate that the experimental results obtained by KST-DPCare closer to the true results than those obtained by the otherclustering algorithms

All these experimental results show that KST-DPC out-perform the other clustering algorithms It obtained largervalues of Acc and NMI than the other clustering algorithms

5 Conclusion

This study proposed a density peak clustering algorithmbased on the K-nearest neighbors Shannon entropy andtissue-like P systems It uses the K-nearest neighbors andShannon entropy to calculate the density metric This algo-rithm overcomes the shortcoming that DPC has that is to setthe value of the cutoff distance 119889119888 in advance The tissue-likeP system is used to realize the clustering processThe analysisdemonstrates that the overall time taken by KST-DPC is

10 Mathematical Problems in Engineering

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(d) SC

Figure 7 Clustering results of the Jain dataset by the four clustering algorithms

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(d) SC

Figure 8 Clustering results of the Aggregation dataset by the four clustering algorithms

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 10: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

10 Mathematical Problems in Engineering

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 40 45

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

30

(d) SC

Figure 7 Clustering results of the Jain dataset by the four clustering algorithms

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

(d) SC

Figure 8 Clustering results of the Aggregation dataset by the four clustering algorithms

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 11: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

Mathematical Problems in Engineering 11

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(a) KST-DPC2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(b) DPC-KNN2 4 6 8 10 12 14 16 18

24

6

8

1012

14

16

18

(c) DBSCAN

2 4 6 8 10 12 14 16 1824

6

8

1012

14

16

18

(d) SC

Figure 9 Clustering results of the R15 dataset by the four clustering algorithms

0 5 10 15 20 25 300

5

10

15

20

25

30

(a) KST-DPC0 5 10 15 20 25 30

0

5

10

15

20

25

30

(b) DPC-KNN0 5 10 15 20 25 30

0

5

10

15

20

25

30

(c) DBSCAN

0 5 10 15 20 25 300

5

10

15

20

25

30

(d) SC

Figure 10 Clustering results of the D31 dataset by the four clustering algorithms

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 12: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

12 Mathematical Problems in Engineering

shorter than those taken by DPC-KNN and the traditionalDPC Synthetic and real-world datasets are used to verifythe performance of the KST-DPC algorithm Experimentalresults show that the new algorithm can get ideal clusteringresults on most of the datasets and outperforms the threeother clustering algorithms referenced in this study

However the parameter 119870 in the K-nearest neighbors isprespecified Currently there is no technique available to setthis value Choosing a suitable value for119870 is a future researchdirection Moreover some other methods can be used tocalculate the densities of the data points In order to improvethe effectiveness of DPC some optimization techniques canalso be employed

Data Availability

The synthetic datasets are available at httpcsueffisipudatasets and the real-world datasets are available athttparchiveicsuciedumlindexphp

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National Natu-ral Science Foundation of China (nos 61876101 61802234and 61806114) the Social Science Fund Project of Shan-dong (16BGLJ06 11CGLJ22) China Postdoctoral ScienceFoundation Funded Project (2017M612339 2018M642695)Natural Science Foundation of the Shandong Provincial(ZR2019QF007) China Postdoctoral Special Funding Project(2019T120607) and Youth Fund for Humanities and SocialSciences Ministry of Education (19YJCZH244)

References

[1] J Han J Pei and M Kamber Data Mining Concepts andTechniques San Francisco CA USA 3rd edition 2011

[2] R J Campello D Moulavi and J Sander ldquoDensity-based clus-tering based on hierarchical density estimatesrdquo in Advances inKnowledgeDiscovery andDataMining vol 7819 ofLectureNotesin Computer Science pp 160ndash172 Springer Berlin Germany2013

[3] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014

[4] J Cong X Xie and FHu ldquoAdensity peak clustermodel of high-dimensional datardquo in Proceedings of the Asia-Pacific ServicesComputing Conference pp 220ndash227 Zhangjiajie China 2016

[5] X Xu S Ding M Du and Y Xue ldquoDPCG an efficient densitypeaks clustering algorithm based on gridrdquo International Journalof Machine Learning and Cybernetics vol 9 no 5 pp 743ndash7542016

[6] R Bie RMehmood S Ruan Y Sun andHDawood ldquoAdaptivefuzzy clustering by fast search and find of density peaksrdquoPersonal and Ubiquitous Computing vol 20 no 5 pp 785ndash7932016

[7] M Du S Ding X Xu and X Xue ldquoDensity peaks clusteringusing geodesic distancesrdquo International Journal of MachineLearning and Cybernetics vol 9 no 8 pp 1ndash15 2018

[8] M Du S Ding and Y Xue ldquoA robust density peaks clusteringalgorithm using fuzzy neighborhoodrdquo International Journal ofMachine Learning and Cybernetics vol 9 no 7 pp 1131ndash11402018

[9] J Hou and H Cui ldquoDensity normalization in density peakbased clusteringrdquo in Proceedings of the International Workshopon Graph-Based Representations in Pattern Recognition pp 187ndash196 Anacapri Italy 2017

[10] X Xu S Ding H Xu H Liao and Y Xue ldquoA feasibledensity peaks clustering algorithmwith amerging strategyrdquo SoComputing vol 2018 pp 1ndash13 2018

[11] R Liu H Wang and X Yu ldquoShared-nearest-neighbor-basedclustering by fast search and find of density peaksrdquo InformationSciences vol 450 pp 200ndash226 2018

[12] M Du S Ding Y Xue and Z Shi ldquoA novel density peaksclustering with sensitivity of local density and density-adaptivemetricrdquo Knowledge and Information Systems vol 59 no 2 pp285ndash309 2019

[13] G Paun ldquoA quick introduction to membrane computingrdquoJournal of Logic Algebraic Programming vol 79 no 6 pp 291ndash294 2010

[14] H Peng J Wang and P Shi ldquoA novel image thresholdingmethod based on membrane computing and fuzzy entropyrdquoJournal of Intelligent amp Fuzzy Systems Applications in Engineer-ing amp Technology vol 24 no 2 pp 229ndash237 2013

[15] M Tu J Wang H Peng and P Shi ldquoApplication of adaptivefuzzy spiking neural P systems in fault diagnosis of powersystemsrdquo Journal of Electronics vol 23 no 1 pp 87ndash92 2014

[16] J Wang P Shi H Peng M J Perez-Jimenez and T WangldquoWeighted fuzzy spiking neural P systemsrdquo IEEE Transactionson Fuzzy Systems vol 21 no 2 pp 209ndash220 2013

[17] B Song C Zhang and L Pan ldquoTissue-like P systems withevolutional symportantiport rulesrdquo Information Sciences vol378 pp 177ndash193 2017

[18] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoDynamic threshold neural P systemsrdquo Knowledge-Based Sys-tems vol 163 pp 875ndash884 2019

[19] LHuang IH Suh andAAbraham ldquoDynamicmulti-objectiveoptimization based on membrane computing for control oftime-varying unstable plantsrdquo Information Sciences vol 181 no11 pp 2370ndash2391 2011

[20] H Peng Y Jiang JWang andM J Perez-Jimenez ldquoMembraneclustering algorithm with hybrid evolutionary mechanismsrdquoJournal of Soware Ruanjian Xuebao vol 26 no 5 pp 1001ndash1012 2015

[21] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoThe framework of P systems applied to solve optimal water-marking problemrdquo Signal Processing vol 101 pp 256ndash265 2014

[22] G Zhang J Cheng M Gheorghe and Q Meng ldquoA hybridapproach based on different evolution and tissue membranesystems for solving constrained manufacturing parameter opti-mization problemsrdquo Applied So Computing vol 13 no 3 pp1528ndash1542 2013

[23] H Peng P Shi J Wang A Riscos-Nunez and M J Perez-Jimenez ldquoMultiobjective fuzzy clustering approach based ontissue-like membrane systemsrdquo Knowledge-Based Systems vol125 pp 74ndash82 2017

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 13: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

Mathematical Problems in Engineering 13

[24] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[25] H Peng J Wang P Shi M J Perez-Jimenez and A Riscos-Nunez ldquoAn extendedmembrane systemwith activemembranesto solve automatic fuzzy clustering problemsrdquo InternationalJournal of Neural Systems vol 26 no 3 pp 1ndash17 2016

[26] H Peng J Wang J Ming et al ldquoFault diagnosis of powersystems using intuitionistic fuzzy spiking neural P systemsrdquoIEEE Transactions on Smart Grid vol 9 no 5 pp 4777ndash47842018

[27] X Liu Y Zhao and M Sun ldquoAn improved apriori algorithmbased on an evolution-communication tissue-like P Systemwith promoters and inhibitorsrdquo Discrete Dynamics in Natureand Society vol 2017 pp 1ndash11 2017

[28] X Liu and J Xue ldquoA cluster splitting technique by hopfieldnetworks and P systems on simplicesrdquoNeural Processing Lettersvol 46 no 1 pp 171ndash194 2017

[29] Y Zhao X Liu and W Wang ldquoSpiking neural P systems withneuron division and dissolutionrdquo PLoS ONE vol 11 no 9Article ID e0162882 2016

[30] M Du S Ding and H Jia ldquoStudy on density peaks clusteringbased on k-nearest neighbors and principal component analy-sisrdquo Knowledge-Based Systems vol 99 no 1 pp 135ndash145 2016

[31] K Bache and M Lichman UCI machine learning repository2013 http archiveicsucieduml

[32] A NgM Jordan and YWeiss ldquoOn spectral clustering analysisand an algorithmrdquo inAdvances in Neural Information ProcessingSystems pp 849ndash856 Vancouver British Columbia Canada2001

[33] M Ester H Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining pp 226ndash231 MenloPark Portland USA 1996

[34] L Yaohui M Zhengming and Y Fang ldquoAdaptive densitypeak clustering based on K-nearest neighbors with aggregatingstrategyrdquo Knowledge-Based Systems vol 133 pp 208ndash220 2017

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 14: A Density Peak Clustering Algorithm Based on the K-Nearest ...downloads.hindawi.com/journals/mpe/2019/1713801.pdf · ResearchArticle A Density Peak Clustering Algorithm Based on the

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom