dpfcm: a novel distributed picture fuzzy clustering method on picture fuzzy sets

16
DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets Le Hoang Son VNU University of Science, Vietnam National University, Viet Nam article info Article history: Available online 26 July 2014 Keywords: Clustering quality Distributed clustering Facilitator model Fuzzy clustering Picture fuzzy sets abstract Fuzzy clustering is considered as an important tool in pattern recognition and knowledge discovery from a database; thus has been being applied broadly to various practical problems. Recent advances in data organization and processing such as the cloud computing technology which are suitable for the manage- ment, privacy and storing big datasets have made a significant breakthrough to information sciences and to the enhancement of the efficiency of fuzzy clustering. Distributed fuzzy clustering is an efficient mining technique that adapts the traditional fuzzy clustering with a new storage behavior where parts of the dataset are stored in different sites instead of the centralized main site. Some distributed fuzzy clustering algorithms were presented including the most effective one – the CDFCM of Zhou et al. (2013). Based upon the observation that the communication cost and the quality of results in CDFCM could be ameliorated through the integration of a distributed picture fuzzy clustering with the facilitator model, in this paper we will present a novel distributed picture fuzzy clustering method on picture fuzzy sets so-called DPFCM. Experimental results on various datasets show that the clustering quality of DPFCM is better than those of CDFCM and relevant algorithms. Ó 2014 Elsevier Ltd. All rights reserved. 1. Introduction Fuzzy clustering is considered as an important tool in pattern recognition and knowledge discovery from a database; thus has been being applied broadly to various practical problems. The first fuzzy clustering algorithm is Fuzzy C-Means (FCM) proposed by Bezdek (1984). It is an iterative algorithm that modifies the centers and the partition matrix in each step in order to satisfy a given objective function. Bezdek proved that FCM converges to the saddle points of the objective function. Even though FCM was proposed long time ago, this algorithm is still a popular fuzzy clustering that has been being applied to many practical problems for the rules extraction and implicit patterns discovery wherein the fuzziness exist such as, Image segmentation (Ahmed, Yamany, Mohamed, Farag, & Moriarty, 2002; Cao, Deng, & Wang, 2012; Chen, Chen, & Lu, 2011; Chuang, Tzeng, Chen, Wu, & Chen, 2006; Krinidis & Chatzis, 2010; Li, Chui, Chang, & Ong, 2011; Ma & Staunton, 2007; Pham, Xu, & Prince, 2000; Siang Tan & Mat Isa, 2011; Zhang & Chen, 2004); Face recognition (Agarwal, Agrawal, Jain, & Kumar, 2010; Chen & Huang, 2003; Haddadnia, Faez, & Ahmadi, 2003; Lu, Yuan, & Yahagi, 2006, 2007); Gesture recognition (Li, 2003; Wachs, Stern, & Edan, 2003); Intrusion detection (Chimphlee, Abdullah, Noor Md Sap, Chimphlee, & Srinoy, 2005; Chimphlee, Abdullah, Noor Md Sap, Srinoy, & Chimphlee, 2006; Shah, Undercoffer, & Joshi, 2003; Wang, Hao, Ma, & Huang, 2010); Hot-spot spatial analysis (Di Martino, Loia, & Sessa, 2008); Risk analysis (Li, Li, & Kang, 2011); Bankrupt prediction (Martin, Gayathri, Saranya, Gayathri, & Venkatesan, 2011); Geo-demographic analysis (Cuong, Son, & Chau, 2010; Son, 2014a, 2014b; Son, Cuong, Lanzi, & Thong, 2012, 2013, 2014; Son, Lanzi, Cuong, & Hung, 2012); Fuzzy time series forecasting and commercial systems (Bai, Dhavale, & Sarkis, 2014; Chu, Liau, Lin, & Su, 2012; Egrioglu, Aladag, & Yolcu, 2013; Egrioglu, 2011; Hadavandi, Shavandi, & Ghanbari, 2011; Izakian & Abraham, 2011; Roh, Pedrycz, & Ahn, 2014; Wang, Ma, Lao, & Wang, 2014; Zhang, Huang, Ji, & Xie, 2011). Recent advances in data organization and processing such as the cloud computing technology which are suitable for the http://dx.doi.org/10.1016/j.eswa.2014.07.026 0957-4174/Ó 2014 Elsevier Ltd. All rights reserved. Official address: 334 Nguyen Trai, Thanh Xuan, Hanoi, Viet Nam. Tel.: +84 904171284; fax: +84 0438623938. E-mail addresses: [email protected], [email protected] Expert Systems with Applications 42 (2015) 51–66 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Upload: le-hoang

Post on 30-Jan-2017

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

Expert Systems with Applications 42 (2015) 51–66

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

DPFCM: A novel distributed picture fuzzy clustering method on picturefuzzy sets

http://dx.doi.org/10.1016/j.eswa.2014.07.0260957-4174/� 2014 Elsevier Ltd. All rights reserved.

⇑ Official address: 334 Nguyen Trai, Thanh Xuan, Hanoi, Viet Nam. Tel.: +84904171284; fax: +84 0438623938.

E-mail addresses: [email protected], [email protected]

Le Hoang Son ⇑VNU University of Science, Vietnam National University, Viet Nam

a r t i c l e i n f o a b s t r a c t

Article history:Available online 26 July 2014

Keywords:Clustering qualityDistributed clusteringFacilitator modelFuzzy clusteringPicture fuzzy sets

Fuzzy clustering is considered as an important tool in pattern recognition and knowledge discovery froma database; thus has been being applied broadly to various practical problems. Recent advances in dataorganization and processing such as the cloud computing technology which are suitable for the manage-ment, privacy and storing big datasets have made a significant breakthrough to information sciences andto the enhancement of the efficiency of fuzzy clustering. Distributed fuzzy clustering is an efficientmining technique that adapts the traditional fuzzy clustering with a new storage behavior where partsof the dataset are stored in different sites instead of the centralized main site. Some distributed fuzzyclustering algorithms were presented including the most effective one – the CDFCM of Zhou et al.(2013). Based upon the observation that the communication cost and the quality of results in CDFCMcould be ameliorated through the integration of a distributed picture fuzzy clustering with the facilitatormodel, in this paper we will present a novel distributed picture fuzzy clustering method on picture fuzzysets so-called DPFCM. Experimental results on various datasets show that the clustering quality of DPFCMis better than those of CDFCM and relevant algorithms.

� 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Fuzzy clustering is considered as an important tool in patternrecognition and knowledge discovery from a database; thus hasbeen being applied broadly to various practical problems. The firstfuzzy clustering algorithm is Fuzzy C-Means (FCM) proposed byBezdek (1984). It is an iterative algorithm that modifies the centersand the partition matrix in each step in order to satisfy a givenobjective function. Bezdek proved that FCM converges to thesaddle points of the objective function. Even though FCM wasproposed long time ago, this algorithm is still a popular fuzzyclustering that has been being applied to many practical problemsfor the rules extraction and implicit patterns discovery wherein thefuzziness exist such as,

� Image segmentation (Ahmed, Yamany, Mohamed, Farag, &Moriarty, 2002; Cao, Deng, & Wang, 2012; Chen, Chen, & Lu,2011; Chuang, Tzeng, Chen, Wu, & Chen, 2006; Krinidis &Chatzis, 2010; Li, Chui, Chang, & Ong, 2011; Ma & Staunton,2007; Pham, Xu, & Prince, 2000; Siang Tan & Mat Isa, 2011;Zhang & Chen, 2004);

� Face recognition (Agarwal, Agrawal, Jain, & Kumar, 2010; Chen& Huang, 2003; Haddadnia, Faez, & Ahmadi, 2003; Lu, Yuan, &Yahagi, 2006, 2007);� Gesture recognition (Li, 2003; Wachs, Stern, & Edan, 2003);� Intrusion detection (Chimphlee, Abdullah, Noor Md Sap,

Chimphlee, & Srinoy, 2005; Chimphlee, Abdullah, Noor MdSap, Srinoy, & Chimphlee, 2006; Shah, Undercoffer, & Joshi,2003; Wang, Hao, Ma, & Huang, 2010);� Hot-spot spatial analysis (Di Martino, Loia, & Sessa, 2008);� Risk analysis (Li, Li, & Kang, 2011);� Bankrupt prediction (Martin, Gayathri, Saranya, Gayathri, &

Venkatesan, 2011);� Geo-demographic analysis (Cuong, Son, & Chau, 2010; Son,

2014a, 2014b; Son, Cuong, Lanzi, & Thong, 2012, 2013, 2014;Son, Lanzi, Cuong, & Hung, 2012);� Fuzzy time series forecasting and commercial systems (Bai,

Dhavale, & Sarkis, 2014; Chu, Liau, Lin, & Su, 2012; Egrioglu,Aladag, & Yolcu, 2013; Egrioglu, 2011; Hadavandi, Shavandi, &Ghanbari, 2011; Izakian & Abraham, 2011; Roh, Pedrycz, &Ahn, 2014; Wang, Ma, Lao, & Wang, 2014; Zhang, Huang, Ji, &Xie, 2011).

Recent advances in data organization and processing such asthe cloud computing technology which are suitable for the

Page 2: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

52 L.H. Son / Expert Systems with Applications 42 (2015) 51–66

management, privacy and storing big datasets have made a signif-icant breakthrough to information sciences in general and to theenhancement of the efficiency of FCM in particular. For example,cloud computing is an Internet-based storage solution where ubiq-uitous computers resources are set up with the same configurationin order to develop and run applications as if they were con-structed in a single centralized system. Users do not need to knowwhere and how the computers resources operate so that the main-tenance and running costs could be reduced; thus guaranteeing thestable expansion of applications. In the cloud computing paradigm,data mining techniques especially fuzzy clustering are very muchneeded in order to retrieve meaningful information from virtuallyintegrated data warehouse. Petre (2012) and Geng and Yang (2013)stated that using data mining through cloud computing reducesthe barriers that keep users from benefiting of the data mininginstruments so that they could only pay for the data mining toolswithout handling complex hardware and data infrastructures.Examples of deploying data mining and clustering algorithms insome typical cloud computing service providers such as Amazonecloud, Google Apps, Microsoft, Salesforce and IBM could be foundin energy aware consolidation (Srikantaiah, Kansal, & Zhao,2008), education (Ercan, 2010), scheduling workflow (Pandey,Wu, Guru, & Buyya, 2010) and others (Surcel & Alecu, 2008). Suchthese algorithms are called the distributed mining techniques.

Distributed fuzzy clustering is a distributed mining techniquethat adapts the traditional fuzzy clustering with a new storagebehavior where parts of the dataset are stored in different sitesinstead of the centralized main site. Distributed fuzzy clusteringis extended from the distributed hard clustering algorithms. Sev-eral efforts on distributed hard/fuzzy clustering could be namedbut a few. Lu, Gu, and Grossman (2010) presented a micro-clusterdistributed clustering algorithm called dSimpleGraph based on therelation between two micro-clusters to classify data on the localmachines and generate a determined global view from local views.Xie, Bai, and Lang (2010) aimed to accelerate the clustering methodof Support Vector Machine for large-scale datasets and presented adistributed clustering method inspired by the Multi-Agent frame-work, in which data are divided to different agents, and the globalclustering result can be generalized from the agents. Gehweilerand Meyerhenke (2010) presented a distributed heuristic usingonly limited local knowledge for clustering static and dynamicgraphs. Kwon et al. (2010) proposed a scalable, parallel algorithmfor data clustering based on the Map Reduce framework. Karjeeand Jamadagni (2011) constructed a distributed clustering algo-rithm based upon spatial data correlation among sensor nodesand performed data accuracy for each distributed cluster at theirrespective cluster head node. Le Khac and Kechadi (2011) proposeda distributed density-based clustering that both reduces thecommunication overheads and improves the quality of the globalmodels by considering the shapes of local clusters. Ghanem,Kechadi, and Tari (2011) introduced a distributed clustering algo-rithm based on the aggregation of models produced locally thatmeans datasets were processed locally on each node and theresults were integrated to construct global clusters hierarchically.The aim of this approach is to minimize the communications,maximize the parallelism and load balance the work among differ-ent nodes of the system, and reduce the overhead due to extraprocessing while executing the hierarchical clustering. Gilhotraand Trikha (2012) presented a cohesive framework for clusteridentification and outlier detection for distributed data based onthe idea that is to generate independent local models and combinethe local models at a central server to obtain global clusters withthe supports of feedback loops. Bui, Kudireti, and Sohier (2012)presented a distributed random walk based clustering algorithmthat builds a bounded-size core through a random walks basedprocedure. Branting (2013) presented a Distributed Pivot

Clustering algorithm that takes only the distance function, whichsatisfies the triangle inequality and is of sufficiently high-granular-ity to permit the data to be partitioned into canopies of optimalsize based on distance to reference elements or pivots. Balcan,Ehrlich, and Liang (2013) provided two distributed clusteringalgorithms based on k-means and k-median. The basic ideas ofthese algorithms are to reduce the problem of finding a clusteringwith low cost to the problem of finding a core-set of small size thenconstruct a global core-set. Hai, Zhang, Zhu, and Wang (2012), Jainand Maheswari (2013) and Singh and Gosain (2013) took surveysof distributed clustering group methods including partitioning,hierarchical, density-based, soft-computing, neural network andfuzzy clustering methods. They argued that datasets in the realworld applications often consist of inconsistencies or outliers,where it is difficult to obtain homogeneous and meaningful globalclusters so that the distributed hard clustering should incorporatewith the fuzzy set theory in order to handle the hesitancy originat-ing from imperfect and imprecise information. A parallel version ofthe FCM algorithm so-called PFCM aiming for the distributed fuzzyclustering was proposed by Rahimi, Zargham, Thakre, and Chhillar(2004). Vendramin, Campello, Coletta, and Hruschka (2011)modified the PFCM algorithm with a pre-processing procedure toestimate the number of clusters and also presented a consensus-based algorithm to distributed fuzzy clustering. Coletta,Vendramin, Hruschka, Campello, and Pedrycz (2012) gave thedistributed version of PFCM, known as PFCM–c⁄ algorithm whichautomatically calculates the number of clusters. Visalakshi,Thangavel, and Parvathi (2010) introduced an intuitionistic fuzzybased distributed clustering algorithm including two different levels:the local level and the global level. In the local level, numericaldatasets are converted into intuitionistic fuzzy data and they areclustered independently from each other using modified FCM algo-rithm. In the global level, global center is computed by clusteringall local cluster centers. The global center is again transmitted tolocal sites to update the local cluster model. The communicationmodel used in Visalakshi et al. (2010) is the facilitator or theMaster–Slave model. A distributed fuzzy clustering namely CDFCMworking on the peer-to-peer (P2P) model was proposed by Zhou,Chen, Chen, and Li (2013). In this algorithm, the cluster centersand attribute-weights are calculated at each peer and then updatedby neighboring results through local communications. The processis repeated until a pre-defined stopping criterion hold, and the sta-tus quo of clusters in all peers reflects accurately the results as inthe centralized clustering. CDFCM was experimental validatedand had better clustering quality than other relevant algorithmssuch as FCM (Bezdek, 1984), PFCM (Rahimi et al., 2004),Soft-DKM (Forero, Cano, & Giannakis, 2011) and WEFCM (Zhou &Philip Chen, 2011). It was considered as one of the most effectivedistributed fuzzy clustering available in the literature.

The motivation of this paper is described as follows. In theactivities of CDFCM, this algorithm solely updates the cluster cen-ters and attribute-weights of each peer by those of neighboringpeers. This requires large communication costs, approximatelyP � NB communications per iteration with P being the number ofpeers and NB being the average number of neighbors of a givenpeer. Additionally, the quality of results in each peer could not behigh since only local updates with neighboring results are con-ducted. Based upon the idea that the communication cost andthe quality of results in CDFCM could be ameliorated through theintegration of a distributed picture fuzzy clustering with the facil-itator model, in this paper we will present a novel distributed pic-ture fuzzy clustering method on picture fuzzy sets so-called DPFCM.The proposed algorithm utilizes the facilitator model that meansall peers transferred their results into a special, unique peer calledthe Master peer so that it takes only P communication costs tocomplete the update process. Employing the Master peer in the

Page 3: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

L.H. Son / Expert Systems with Applications 42 (2015) 51–66 53

facilitator model also helps increasing the capability to updatemore numbers of neighboring results, thus advancing the qualityof results. In order to enhance the clustering quality as high as pos-sible, we also deploy the distributed fuzzy clustering algorithm inpicture fuzzy sets (PFS) (Cuong & Kreinovich, 2013), which inessence are the generalization of the traditional fuzzy sets (FS)(Zadeh, 1965) and intuitionistic fuzzy set (IFS) (Atanassov, 1986)used for the development of the existing CDFCM algorithm. PFSbased models can be applied to situations requiring human opin-ions involving more answers of types: yes, abstain, no and refusal,which can not be accurately expressed in the traditional FS. There-fore, deploying the distributed clustering algorithm in PFS couldgive higher clustering quality than in FS and in IFS. Our contribu-tion in this paper is a novel distributed picture fuzzy clusteringmethod (DPFCM) that utilizes the ideas of both the facilitatormodel and deploying clustering algorithms in PFS in order to ame-liorate the clustering quality. The proposed algorithm will beimplemented and validated in comparison with CDFCM and otherrelevant algorithms in terms of clustering quality. The significanceof this contributed research is not only the enhancement of theclustering quality of distributed fuzzy clustering algorithm but alsothe enrichment of the know-how knowledge of integrating picturefuzzy sets to clustering algorithms and deploying them to practicalapplications. Indeed, the contribution of this paper is meaningful toboth theoretical and practical sides.

The rest of the paper is organized as follows. Section 2 gives thepreliminary about the PFS set. The formulation of clustering algo-rithms in PFS in association with the facilitator model is describedin Section 3. Section 4 validates the proposed approach through aset of experiments involving benchmark data. Finally, Section 5draws the conclusions and delineates the future researchdirections.

2. Preliminary

In this section, we take a brief overview of some basic terms andnotations in PFS, which can be used throughout the paper.

Definition 1. A picture fuzzy set (PFS) (Cuong & Kreinovich, 2013)in a non-empty set X is,_A ¼ x;l _AðxÞ;g _AðxÞ; c _AðxÞ

� �jx 2 X

� �; ð1Þ

where l _AðxÞ is the positive degree of each element x 2 X;g _AðxÞ is theneutral degree and c _AðxÞ is the negative degree satisfying theconstraints,

l _AðxÞ;g _AðxÞ; c _AðxÞ 2 ½0;1�; 8x 2 X; ð2Þ

0 6 l _AðxÞ þ g _AðxÞ þ c _AðxÞ 6 1; 8x 2 X: ð3ÞThe refusal degree of an element is calculated as

n _AðxÞ ¼ 1� ðl _AðxÞ þ g _AðxÞ þ c _AðxÞÞ;8x 2 X. In cases n _AðxÞ ¼ 0 PFSreturns to intuitionistic fuzzy sets (IFS) (Atanassov, 1986), andwhen both g _AðxÞ ¼ n _AðxÞ ¼ 0, PFS returns to fuzzy sets (FS)(Zadeh, 1965). In order to illustrate the applications of PFS, let usconsider some examples below.

Example 1. In a democratic election station, the council issues 500voting papers for a candidate. The voting results are divided intofour groups accompanied with the number of papers that are ‘‘votefor’’ (300), ‘‘abstain’’ (64), ‘‘vote against’’ (115) and ‘‘refusal ofvoting’’ (21). Group ‘‘abstain’’ means that the voting paper is awhite paper rejecting both ‘‘agree’’ and ‘‘disagree’’ for the candi-date but still takes the vote. Group ‘‘refusal of voting’’ is eitherinvalid voting papers or did not take the vote. This example washappened in reality and IFS could not handle it since the refusaldegree (group ‘‘refusal of voting’’) does not exist.

Example 2. A patient was given the first emergency aid and diag-nosed by four states after examining possible symptoms that are‘‘heart attack’’, ‘‘uncertain’’, ‘‘not heart attack’’, ‘‘appendicitis’’. Inthis case, we also have a PFS set.

Now, we briefly present some basic picture fuzzy operations,picture distance metrics and picture fuzzy relations. Let PFS(X)denote the set of all PFS sets on universe X.

Definition 2. For A,B 2 PFS(X), the union, intersection and com-plement operations are defined as follows.

A [ B ¼ fhx;maxflAðxÞ;lBðxÞg;minfgAðxÞ;gBðxÞg;minfcAðxÞ; cBðxÞgijx 2 Xg; ð4Þ

A \ B ¼ fhx;minflAðxÞ;lBðxÞg;minfgAðxÞ;gBðxÞg;maxfcAðxÞ; cBðxÞgijx 2 Xg; ð5Þ

A ¼ fhx; cAðxÞ;gAðxÞ;lAðxÞijx 2 Xg: ð6Þ

Definition 3. For A,B 2 PFS(X), the Cartesian product of these PFSsets is,

A�1B¼fhðx;yÞ;lAðxÞ:lBðyÞ;gAðxÞ:gBðyÞ;cAðxÞ:cBðyÞijx2A;y2Bg; ð7ÞA�2B¼fhðx;yÞ;lAðxÞ^lBðyÞ;gAðxÞ^gBðyÞ;

cAðxÞ_cBðyÞijx2A; y2Bg: ð8Þ

Definition 4. The distances between A,B 2 PFS(X) are the normal-ized Hamming distance and the normalized Euclidean in Eqs. (9)and (10), respectively.

dpðA;BÞ ¼1N

XN

i¼1

ðjlAðxiÞ � lBðxiÞj þ jgAðxiÞ � gBðxiÞj þ jcAðxiÞ � cBðxiÞjÞ;

ð9Þ

epðA; BÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1N

XN

i¼1

ððlAðxiÞ � lBðxiÞÞ2 þ ðgAðxiÞ � gBðxiÞÞ2 þ ðcAðxiÞ � cBðxiÞÞ2Þ

vuut :

ð10Þ

Definition 5. The picture fuzzy relation R is a picture fuzzy subsetof A � B, given by

R ¼ fhðx; yÞ;lRðx; yÞ;gRðx; yÞ; cRðx; yÞijx 2 A; y 2 Bg; ð11Þ

lR;gR; cR : A� B! ½0;1�; ð12Þ

lRðx; yÞ þ gRðx; yÞ þ cRðx; yÞ 6 1; 8ðx; yÞ 2 A� B: ð13Þ

PFR(A � B) is the set of all picture fuzzy subset on A � B. Someproperties of PFS operations, the convex combination of PFS, etc.accompanied with proofs can be referenced in Cuong andKreinovich (2013).

3. The proposed method

3.1. The proposed distributed picture fuzzy clustering model

In this section, we propose a distributed picture fuzzy clusteringmodel. The communication model is the facilitator or the Master–Slave model having a Master peer and P Slave peers, and each Slavepeer is allowed to communicate with the Master only. Each Slavepeer has a subset of the original dataset X consisting of N datapoints in r dimensions. We call the subset Yjðj ¼ 1; PÞ and

[Pj¼1Yj ¼ X;

PPj¼1jYjj ¼ N. The number of dimensions in a subset is

exactly the same as that in the original dataset. Let us divide thedataset X into C groups satisfying the objective function below.

Page 4: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

54 L.H. Son / Expert Systems with Applications 42 (2015) 51–66

J ¼XP

l¼1

XYl

k¼1

XC

j¼1

ulkj

1� glkj � nlkj

!mXr

h¼1

wljhkXlkh � Vljhk2

þ cXP

l¼1

XC

j¼1

Xr

h¼1

wljh log wljh !min; ð14Þ

Where ulkj, glkj and nlkj are the positive, the neutral and the refusaldegrees of data point kth to cluster jth in the Slave peer lth. Thisreflects the clustering in the PFS set expressed through Definition1. wljh is the attribute-weight of attribute hth to cluster jth in theSlave peer lth. Vljh is the center of cluster jth in the Slave peer lthaccording to attribute hth. Xlkh is the kth data point of the Slave peerlth according to attribute hth. m and c are the fuzzifier and a posi-tive scalar, respectively. The constraints for (14) are shown below.

ulkj;glkj; nlkj 2 ½0;1�; ð15Þ

ulkj þ glkj þ nlkj 6 1; ð16Þ

XC

j¼1

ulkj

1� glkj � nlkj

!¼ 1; ð17Þ

XC

j¼1

glkj þnlkj

C

� �¼ 1; ð18Þ

Xr

h¼1

wljh ¼ 1; ð19Þ

Vljh ¼ Vijh; ð8i – l; i; l ¼ 1; PÞ; ð20Þ

wljh ¼ wijh:ð8i – l; i; l ¼ 1; PÞ: ð21Þ

The proposed model in Eqs. (14)–(21) relies on the principles of thePFS set and the facilitator model. The differences of this model withthe CDFCM model of Zhou et al. (2013) are expressed below.

� The proposed model is the generalization of the CDFCM modelsince when glkj = nlkj = 0 that means the PFS set degrades tothe FS set, it returns to the CDFCM model resulting in boththe objective function and the constraints. In the other words,a new membership-like function ulkj/(1 � glkj � nlkj) wasappended to the objective function instead of ulkj in CDFCM.Moreover, the constraints (15)–(18) that describes the relationsof some degrees in the PFS set were integrated to the optimiza-tion problem. By doing so, the new distributed picture fuzzyclustering model is totally set up according to the PFS set.� The proposed model utilizes the facilitator model to increase

the number of neighboring results used to update that of agiven peer, thus giving high accuracy of the final results. Thisreflects in the constraints (20) and (21) where the cluster cen-ters and the attribute-weights of two ubiquitous peers are coin-cided so that these local centers and attribute-weights wouldconverge to the global ones.

Additional remarks of the distributed picture fuzzy clusteringmodel in (14)–(21) are:

� The objective function in (14) both minimizes the dispersionwithin clusters and maximizes the entropy of attribute-weightsallowing important attributes could contribute greatly to theidentification of clusters.� The constraints (15) and (16) are originated from the definition

of PFS.� Constraint (17) describes that the sum of memberships of a data

point to all clusters in a Slave peer is equal to one. Analogously,constraint (18) states that the sum of hesitant memberships of adata point to all clusters in a Slave peer expressed through theneutral and refusal degrees is also equal to one.

� Constraint (19) forces that the sum of attribute-weights for agiven cluster in a peer is equal to one. Thus, all attributes couldbe normalized for the clustering.� Outputs of the distributed picture fuzzy clustering model (14)–

(21) are the optimal cluster centers fVljhjl ¼ 1; P; j ¼ 1;C;

h ¼ 1; rg, the picture degrees fðulkj;glkj; nlkjÞjl ¼ 1; P; k ¼ 1;Yl;

j ¼ 1;Cg in all peers showing which cluster that a data pointbelongs to and the attribute-weights fwljhjl ¼ 1; P; j ¼ 1;C;

h ¼ 1; rg. Based upon these results, the state of clusters in agiven peer is determined, and the global results could beretrieved from the local ones according to a specific cluster.

3.2. The solutions

In this section, we use the Lagranian method and the Picarditeration to determine the optimal solutions of the model (14)–(21) as follows.

Theorem 1. The optimal solutions of the systems (14)–(21) are:

ulkj ¼1� glkj � nlkj

PCi¼1

Pr

h¼1wljh Xlkh�Vljhk k2Pr

h¼1wlih Xlkh�Vlihk k2

� � 1m�1

; ð8l ¼ 1; P; k ¼ 1;Yl; j ¼ 1;CÞ;

ð22Þ

hlijh ¼ hlijh þ a1ðVljh � VijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1;C; h ¼ 1; rÞ;ð23Þ

Vljh ¼

PYlk¼1

ulkj

1�glkj�nlkj

mwljhXlkh �

PPi ¼ 1i–l

hlijh

PYlk¼1

ulkj

1�glkj�nlkj

mwljh

;

ð8l ¼ 1; P; j ¼ 1;C; h ¼ 1; rÞ; ð24Þ

Dlijh ¼ Dljih þ a2ðwljh �wijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1;C; h ¼ 1; rÞ;ð25Þ

wljh ¼exp � 1

c �PYl

k¼1ulkj

1�glkj�nlkj

mXlkh � Vljh

�� ��2 þ cþ 2PP

i¼1i–l

Dlijh

h i Pr

h0¼1 exp � 1c �

PYlk¼1

ulkj

1�glkj�nlkj

mXlkh0 � Vljh0�� ��2 þ cþ 2

PPi¼1i–l

Dlijh0

h i ;ð8l ¼ 1; P; j ¼ 1;C; h ¼ 1; rÞ; ð26Þ

glkj ¼ 1� nlkj þC�1

C

PCi¼1nlkiPC

i¼1ulkj

ulki

Prh¼1

wlihkXlkh�Vlihk2

wljhkXlkh�Vljhk2

� � 1mþ1

;

ð8l ¼ 1; P; k ¼ 1;Yl; j ¼ 1;CÞ; ð27Þ

nlkj ¼ 1� ðulkj þ glkjÞ � ð1� ðulkj þ glkjÞaÞ1=a;

ð8l ¼ 1; P; k ¼ 1;Yl; j ¼ 1;CÞ: ð28Þ

Proof. (A) Fix W,V,g,n the Lagranian function with respect to U is:

LðUÞ ¼XP

l¼1

XYl

k¼1

XC

j¼1

ulkj

1� glkj � nlkj

!mXr

h¼1

wljh Xlkh � Vljh

�� ��2

þ cXP

l¼1

XC

j¼1

Xr

h¼1

wljh log wljh

�XP

l¼1

XYl

k¼1

klk

XC

j¼1

ulkj

1� glkj � nlkj� 1

!; ð29Þ

Page 5: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

L.H. Son / Expert Systems with Applications 42 (2015) 51–66 55

@LðUÞ@ulkj

¼ m1�glkj � nlkj

ulkj

1� glkj � nlkj

!m�1Xr

h¼1

wljh Xlkh �Vljh

�� ��2

� klk

1� glkj � nlkj¼ 0; ð8l¼ 1;P; k¼ 1;Yl; j¼ 1;C; h¼ 1; rÞ;

ð30Þ

ulkj ¼ ð1� glkj � nlkjÞklk

mPr

h¼1wljh � kXlkh � Vljhk2

! 1m�1

: ð31Þ

From constraint (17), we have

klk ¼ m1

PCj¼1

1Pr

h¼1wljhkXlkh�Vljhk2

� � 1m�1

0BBB@

1CCCA

m�1

: ð32Þ

Substitute (32) into (31) we obtain the optimal solutions of ulkj asfollows.

ulkj ¼1� glkj � nlkj

PCi¼1

Pr

h¼1wljhkXlkh�Vljhk2Pr

h¼1wlihkXlkh�Vlihk2

� � 1m�1

; ð8l ¼ 1; P; k ¼ 1;Yl; j ¼ 1;CÞ:

ð33Þ

(B) We fix all degrees and the attribute-weights to calculate thecluster centers by the Lagranian function below.

LðVÞ ¼XP

l¼1

XYl

k¼1

XC

j¼1

ulkj

1� glkj � nlkj

!mXr

h¼1

wljh Xlkh � Vljh

�� ��2

þ cXP

l¼1

XC

j¼1

Xr

h¼1

wljh log wljh �XP

l¼1

XP

i¼1i–l

XC

j¼1

Xr

h¼1

hlijhðVljh � VijhÞ; ð34Þ

where hlijh is a Lagranian multiplier matrix. Taking the derivative ofL(V) with respect to Vljh we have

@LðVÞ@Vljh

¼�2XYl

k¼1

ulkj

1�glkj�nlkj

!m

wljh Xlkh�Vljh

�� ��þXP

i¼1i–l

hlijh�XP

i¼1i–l

hiljh¼0;

ð35Þ

Vljh¼PYl

k¼1ulkj

1�glkj�nlkj

mwljhXlkh�

PPi¼1i–l

hlijhPYlk¼1

ulkj

1�glkj�nlkj

mwljh

; ð8l¼1;P; j¼1;C; h¼1;rÞ:

ð36Þ

hlijh is calculated by a Picard iteration below with a1 being a positivescalar.

hlijh ¼ hlijh þ a1 � ðVljh � VijhÞ; ð8i – l; i; l ¼ 1;P; j ¼ 1;C; h ¼ 1; rÞð37Þ

(C) By the similar calculation with (B), we take the Lagranianfunction with respect to W.

LðVÞ ¼XP

l¼1

XYl

k¼1

XC

j¼1

ulkj

1� glkj � nlkj

!mXr

h¼1

wljh Xlkh � Vljh

�� ��2

þ cXP

l¼1

XC

j¼1

Xr

h¼1

wljh log wljh �XP

l¼1

XC

j¼1

klj

Xr

h¼1

ðwljh � 1Þ

þXP

l¼1

XP

i¼1i–l

XC

j¼1

Xr

h¼1

Dlijhðwljh �wijhÞ: ð38Þ

@L@wljh

¼XYl

k¼1

ulkj

1�glkj� nlkj

!m

kXlkh�Vljhk2þcðlogwljhþ1Þ� klj

þXP

i¼1i–l

ðDlijh�DiljhÞ ¼ 0; ð8l¼ 1;P; k¼ 1;Yl; j¼ 1;C; h¼ 1;rÞ;

ð39Þ

wljh ¼ exp �1c�

XYl

k¼1

ulkj

1�glkj � nlkj

!m

Xlkh �Vljh

�� ��2 þ c� klj þ 2XP

i¼1i–l

Dlijh

24

35

0@

1A;

ð8l¼ 1;P; k¼ 1;Yl; j¼ 1;C; h¼ 1; rÞ:

Applying constraint (19)–(40) we have

expklj

c

� �¼ 1Pr

h¼1 exp � 1c�

PYlk¼1

ulkj

1�glkj�nlkj

mXlkh �Vljh

�� ��2 þ cþ 2PP

i¼1i–l

Dlijh

h i ;ð8l¼ 1;P; k¼ 1;Yl; j¼ 1;C; h¼ 1; rÞ: ð41Þ

wljh ¼exp � 1

c�PYl

k¼1ulkj

1�glkj�nlkj

mkXlkh �Vljhk2 þ cþ 2

Pi¼1i–l

P

Dlijh

� � �Pr

h0¼1 exp � 1c�

PYlk¼1

ulkj

1�glkj�nlkj

mXlkh0 � Vljh0�� ��2 þ cþ 2

PPi¼1i–lDlijh0

h i ;ð8l¼ 1;P; j¼ 1;C; h¼ 1; rÞ; ð42Þ

Dlijh is calculated by a Picard iteration below with a2 being apositive scalar.

Dlijh ¼ Dljih þ a2ðwljh �wijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1;C; h ¼ 1; rÞð43Þ

(D) Fix W,V,u,n the Lagranian function with respect to g is:

LðgÞ ¼XP

l¼1

XYl

k¼1

XC

j¼1

ulkj

1� glkj � nlkj

!mXr

h¼1

wljh Xlkh � Vljh

�� ��2

þ cXP

l¼1

XC

j¼1

Xr

h¼1

wljh log wljh

�XP

l¼1

XYl

k¼1

klk

XC

j¼1

glkj þnlkj

C

� �� 1

!; ð44Þ

@LðgÞ@glkj

¼Xr

h¼1

ulkj

1� glkj � nlkj

!mm

1� glkj � nlkjwljh Xlkh �Vljh

�� ��2 � klk ¼ 0;

ð8l¼ 1;P; k¼ 1;Yl; j¼ 1;C; h¼ 1; rÞ; ð45Þ

glkj ¼ 1� nlkj � mum

lkj

Prh¼1wljh Xlkh � Vljh

�� ��2

klk

! 1mþ1

: ð46Þ

Applying constraint (18)–(46) we have

glkj ¼ 1� nlkj þC�1

C

PCi¼1nlkiPC

i¼1ulkj

ulki

Prh¼1

wlih Xlkh�Vlihk k2

wljh Xlkh�Vljhk k2

� � 1mþ1

;

ð8l ¼ 1; P; k ¼ 1;Yl; j ¼ 1;CÞ: ð47Þ

(E) Once we have ulkj and glkj, from constraint (16), we can usethe Yager generating operator to determine the value of nlkj asfollows.

nlkj ¼ 1� ðulkj þ glkjÞ � ð1� ðulkj þ glkjÞaÞ1=a;

ð8l ¼ 1; P; k ¼ 1;Yl; j ¼ 1;CÞ: ð48Þ

Notice that a > 0 is an exponent coefficient used to control therefusal degree in PFS sets. The proof is complete. h

Page 6: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

56 L.H. Son / Expert Systems with Applications 42 (2015) 51–66

3.3. The DPFCM algorithm

In this section, we present in details the DPFCM algorithm.

Distributed Picture Fuzzy Clustering Method (DPFCM)

I: - Data X whose number of elements (N) in r dimensions- Number of clusters: C- Number of peers: P + 1- Fuzzifier m- Threshold e > 0- Parameters: c,a1,a2,a, max Iter

O: fVljhjl ¼ 1; P; j ¼ 1;C; h ¼ 1; rg; fðulkj;glkj; nlkjÞjl ¼ 1; P; k ¼ 1;Yl; j ¼ 1;Cg fwljhjl ¼ 1; P; j ¼ 1;C; h ¼ 1; rg.DPFCM1S: Initialization:

- Set the number of iterations: t = 0- Set DlijhðtÞ ¼ hlijhðtÞ ¼ 0; ð8i – l; i; l ¼ 1; P; j ¼ 1;C; h ¼ 1; rÞ- Randomize fðulkjðtÞ;glkjðtÞ; nlkjðtÞÞjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1;Cg satisfying (16)

- Set wljh(t) = 1/r (l ¼ 1; P; j ¼ 1;C; h ¼ 1; r)2S: Calculate cluster centers VljhðtÞ; ðl ¼ 1; P; j ¼ 1;C; h ¼ 1; rÞ from (ulkj(t), glkj(t), nlkj(t)), wljh(t) and hlijh(t) by (24)3S: Calculate attribute-weights wljh(t + 1), ðl ¼ 1; P; j ¼ 1;C; h ¼ 1; rÞ from (ulkj(t),glkj(t),nlkj(t)), Vljh(t) and Dlijh(t) by (26)4S: Send fDlijhðtÞ; hlijhðtÞ;VljhðtÞ;wljhðt þ 1Þji; l ¼ 1; P; i – l; k ¼ 1;Yl; j ¼ 1;Cg to Master5M: Calculates fDlijhðt þ 1Þ; hlijhðt þ 1Þji; l ¼ 1; P; i – l; k ¼ 1;Yl; j ¼ 1; Cg by (23) and (25) and send them to Slave peers6S: Calculate cluster centers Vljhðt þ 1Þ; ðl ¼ 1; P; j ¼ 1;C; h ¼ 1; rÞ from (ulkj(t), glkj (t), nlkj(t)), wljh(t + 1) and hlijh(t + 1) by (24)7S: Calculate positive degrees fulkjðt þ 1Þjl ¼ 1; P; k ¼ 1;Yl; j ¼ 1;Cg from (glkj (t),nlkj(t)), wljh(t + 1) and Vljh(t + 1) by (22)8S: Compute neutral degrees fglkjðt þ 1Þjl ¼ 1; P; k ¼ 1;Yl; j ¼ 1;Cg from (ulkj (t + 1), nlkj(t)), wljh(t + 1) and Vljh(t + 1) by (27)9S: Calculate refusal degrees fnlkjðt þ 1Þjl ¼ 1; P; k ¼ 1;Yl; j ¼ 1;Cg from (ulkj (t + 1), glkj(t + 1)), wljh(t + 1) and Vljh(t + 1) by (28)10S: If maxlfmaxfkulkjðt þ 1Þ � ulkjðtÞk; kglkjðt þ 1Þ � glkjðtÞk; knlkjðt þ 1Þ � nlkjðtÞkgg < e or t > max Iter then stop the algorithm,

Otherwise set t = t + 1 and return Step 3SS: Operations in Slave peersM: Operations in the Master peer

3.4. The theoretical analyses of DPFCM

In this section, we make the analyses of the DPFCM algorithmincluding the profound meaning of Theorem 1 and the advanta-ges/disadvantages of the proposed work. As we can recognize inthe proposed model (Section 3.1), the problem (14)–(21) is anoptimization problem aiming to derive the cluster centers accom-panied with the attribute-weights and the positive, the neutral andthe refusal memberships of data points from a given dataset and afacilitator system. By using the Lagranian method and the Picarditeration, the optimal solutions of the problem are determined asin Eqs. (22)–(28). We clearly see that the cluster centers (24), theattribute-weights (26) and the positive (22), the neutral (27) andthe refusal memberships (28) are affected by the facilitator modelthrough the uses of two Lagranian multipliers expressed in Eqs.(23) and (25). Specifically, hlijh directly makes the changes of thecluster centers in (24) and then they are updated in the Masterpeer by Eq. (23). The new updated multipliers are continued tobe used in the next step of the cluster centers in (24). Similarly,Dlijh contributes greatly to the changes of the values of the attri-bute-weights in (26). These weights are used for the calculationof all memberships and the cluster centers. Similar to hlijh, Dlijh

are updated in the Master peers by those of other peers. Usingthe facilitator model in this case expressed by the activities oftwo Lagranian multipliers hlijh and Dlijh helps the local results in apeer being updated with those of other peers so that the local

clustering outputs could reach to the global optimum. Besidesthe facilitator model, the utilization of various memberships in(22), (27) and (28) both reflects the principle of the PFS set andimproves the clustering quality of the algorithm. That is to say,

the final cluster centers in (24) are affected by the membership-like ulkj/(1 � glkj � nlkj) whose membership components arecalculated based upon the dataset, the previous cluster centersand memberships; thus regulating the next results according tothe previous ones in a good manner. The meaning of Theorem 1is not just the reflection of the ideas stated in Section 1 but alsothe expression of the calculation process, which can be easilyinterpreted into the algorithm in Section 3.3. The advantages ofthe proposed algorithm are threefold. Firstly, the proposed cluster-ing algorithm could be applied to various practical problemsrequiring fast processing of huge datasets. In fact, since the activi-ties of the algorithm are simultaneously performed in all peers, thetotal operating time is reduced as a result. The clustering quality ofoutputs is also better than those of the relevant distributed cluster-ing algorithms according to our theoretical analyses in Section 1.Secondly, the proposed algorithm is easy to implement and couldbe adapted with many parallel processing models such as the Mes-sage Passing Interface (MPI), Open Multi-Processing (OpenMP),Local Area Multicomputer (LAM/MPI), etc. Thirdly, the design ofthe DPFCM algorithm in this article could be a know-how tutorialfor the development of fuzzy clustering algorithms on advancedfuzzy sets like the PFS set. Besides the advantages, the proposedwork still contains some limitations as follows. Firstly, the DPFCMalgorithm has large computational time in comparison with somerelevant algorithms such as FCM, PFCM, Soft-DKM, WEFCM andCDFCM due to extra computation on the membership degrees

Page 7: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

Table 1The descriptions of experimental datasets.

Dataset No. elements No. attributes No. classes Elements in each classes

IRIS 150 4 3 (50,50,50)GLASS 214 9 6 (70,17,76,13,9,29)IONOSPHERE 351 34 2 (126,225)HABERMAN 306 3 2 (270,81)HEART 270 13 2 (150,120)

L.H. Son / Expert Systems with Applications 42 (2015) 51–66 57

and the results of all peers. Secondly, the number of peers couldaffect the clustering quality of outputs. Large number of peersmay enhance the clustering quality but also increase the computa-tional time of algorithm. How many numbers of peers is enough tobalance between the clustering quality and the computationaltime? In the experiment section, we will validate these remarksas well as find the answers for these questions.

4. Evaluation

4.1. Experimental environment

In this part, we describe the experimental environments suchas,

� Experimental tools: we have implemented the proposed DPFCMalgorithm in MPI/C programming language and executed it on aPC Intel Pentium 4, CPU 3.4 GHz, 4 GB RAM, 160 GB HDD. Theexperimental results are taken as the average values after 100runs and are compared with those of FCM (Bezdek, 1984), PFCM(Rahimi et al., 2004), Soft-DKM (Forero et al., 2011), WEFCM(Zhou & Philip Chen, 2011) and CDFCM (Zhou et al., 2013).� Experimental dataset: the benchmark datasets of UCI Machine

Learning Repository (Bache & Lichman, 2013) such as IRIS,GLASS, IONOSPHERE, HABERMAN and HEART. IRIS is a standarddata set consisting of 150 instances with three classes and fourattributes in which each class contains of 50 instances. GLASScontains 214 instances, 6 classes and 9 attributes which arerefractive index, sodium, magnesium, aluminum, silicon, potas-sium, calcium, barium, and iron. IONOSPHERE contains 351instances of radar data, 34 attributes and 2 classes where‘‘Good’’ radar shows evidence of some types of structures inthe ionosphere and ‘‘Bad’’ returns those that do not. HABERMANcontains cases from a study that was conducted between 1958and 1970 at the University of Chicago’s Billings Hospital on thesurvival of patients who had undergone surgery for breast

Fig. 1. The initiati

cancer. It contains 306 instances, 3 attributes and 2 classes.HEART shows the information of heart diseases including 270instances, 13 attributes and 2 classes. Table 1 gives an overviewof those datasets.

These datasets are normalized by a simple normalizationmethod used in Zhou et al. (2013) expressed as follows.

X�i ¼Xi �min

ifXig

maxifXig �min

ifXig

; ð49Þ

where X�i ðXiÞ is the new (old) data point ith. Small subsetsYjðj ¼ 1; PÞ are generated by randomly selected from the originaldataset satisfying the condition [P

j¼1Yj ¼ X.

� Parameter setting: in order to accurate comparison with therelevant methods, we set the parameters as in Zhou et al.(2013) including: m = 2, e = 0.01, c = a1 = a2 = 1, a = 0.5, maxIter = 1000.� Cluster validity measurement: we use the Average Iteration Num-

ber (AIN), the Average Classification Rate (ACR) (Eq. 50) and theAverage Normalized Mutual Information (ANMI) (Eq. 51)(Huang, Chuang, & Chen, 2012). ACR and ANMI are the-larger-the-better validity indices whilst AIN is the-smaller-the-better.

CR ¼PK

k¼1dk

N; ð50Þ

where dk is the number of objects correctly identified in kth clusterand N is the total number of objects in the dataset.

NMIðR;QÞ ¼PI

i¼1

PJj¼1Pði; jÞ log Pði;jÞ

PðiÞPðjÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiHðRÞHðQÞ

p ; ð51Þ

where R, Q are two partitions of the dataset having I and J clusters,respectively. P(i) is the probability that a randomly selected objectfrom the dataset falls into cluster Ri in the partition R. P(i,j) is the

on of peer 1.

Page 8: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

Fig. 2. The initiation of peer 2.

Fig. 3. The initiation of peer 3.

Fig. 4. The communication in each iteration step.

58 L.H. Son / Expert Systems with Applications 42 (2015) 51–66

Page 9: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

Fig. 6. The distribution of clusters of Peer 2 in the second iteration.

Fig. 5. The distribution of clusters of Peer 1 in the second iteration.

L.H. Son / Expert Systems with Applications 42 (2015) 51–66 59

probability that an object belongs to cluster Ri in R and cluster Qj inQ. H(R) is the entropy associated with probabilities P(i) (1 6 i 6 I) inpartition R. AIN, ACR and ANMI are the average results after 100runs.

� Objective: (a) to illustrate the activities of DPFCM to classify aspecific benchmark dataset of UCI Machine Learning Reposi-tory; (b) to evaluate the clustering qualities of algorithmsthrough validity indices; (c) to measure the effect of the numberof peers to the clustering quality; (d) to investigate the compu-tational time of all algorithms.

4.2. An illustration of DPFCM

Firstly, we illustrate the activities of the proposed algorithm –DPFCM to classify the IRIS dataset. In this case, N = 150, r = 4,C = 3 and the number of peers is P = 3. The cardinalities of the first,second and third peers are 38, 39 and 73, respectively. The initial

positive, the neutral and the refusal matrices of the first peer areinitialized in (52)–(54), respectively.

0:082100 0:836100 0:0115000:722100 0:002400 0:9309000:365000 0:983200 0:578800. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0:002900 0:199000 0:6084000:116700 0:462500 0:932100

ð52Þ

0:052229 0:123007 0:1438270:131697 0:878686 0:0364710:466915 0:002841 0:415851. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0:034799 0:450723 0:2130300:747537 0:094331 0:017593

ð53Þ

Page 10: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

Fig. 7. The distribution of clusters of Peer 3 in the second iteration.

60 L.H. Son / Expert Systems with Applications 42 (2015) 51–66

0:477245 0:020119 0:2443640:076669 0:050943 0:0260510:065771 0:000965 0:001082. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0:560059 0:075800 0:0851420:113362 0:394597 0:014931

ð54Þ

From this initialization, the distribution of clusters of the first peerin the first iteration is depicted in Fig. 1.

Similarly, the distributions of clusters of the second and thirdpeer in the first iteration are depicted in Figs. 2 and 3, respectively.Now, we illustrate the activities of the first peer. The cluster cen-ters Vljh(0) calculated by Eq. (24) are expressed in Eq. (55).

0:568 0:523 0:603 0:4660:563 0:564 0:608 0:4700:546 0:542 0:619 0:494

ð55Þ

The attribute-weights wljh(1) are computed from (26) and shown in(56).

0:137 0:225 0:291 0:3460:181 0:268 0:253 0:2980:185 0:234 0:273 0:309

ð56Þ

Now all Slave peers synchronize their pairs {Dlijh (0), hlijh(0), Vljh(0),wljh(1)} to the Master. The communication model is depicted inFig. 4.

Peer1 : Peer2 : Peer3 :

0:000 0:000 0:000 0:000 �0:171 � 0:180 0:160 0:191 �0:139 � 00:000 0:000 0:000 0:000 �0:185 � 0:061 0:120 0:126 �0:056 � 00:000 0:000 0:000 0:000 �0:195 � 0:087 0:140 0:141 �0:049 � 0

Peer1 : Peer2 : Peer3 :

0:000 0:000 0:000 0:000 0:159 0:157 0:181 0:099 0:065 0:081 0:00:000 0:000 0:000 0:000 0:169 0:098 0:282 0:183 0:118 0:123 0:10:000 0:000 0:000 0:000 0:148 0:091 0:276 0:192 0:092 0:099 0:1

The values of Lagranian multipliers Dlijh(1), hlijh(1) in all peersafter updating from the Master peer are described in (57) and(58), respectively.The cluster centers Vljh(1) are then updated accordingly.

0:505968 0:482254 0:573009 0:4672320:496019 0:529548 0:542298 0:4521270:495326 0:510318 0:562189 0:474282

ð59Þ

Based upon Vljh(1), wljh(1) and the updated Lagranian multipliers,new positive, the neutral and the refusal matrices of the first peerare calculated in (60)–(62), respectively.

0:146003 0:306919 0:2028250:244076 0:025787 0:3049080:146620 0:359057 0:189975. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0:129143 0:158800 0:2427260:045708 0:161369 0:344092

ð60Þ

0:293576 0:638569 0:5942590:740159 0:863683 0:6340790:821898 0:585084 0:796155. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0:232520 0:758352 0:7267130:640528 0:534759 0:625413

ð61Þ

:359 0:214 0:284:399 0:196 0:260:457 0:227 0:279

ð57Þ

44 � 0:11311 � 0:05422 � 0:033

ð58Þ

Page 11: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

Fig. 9. The distribution of clusters of Peer 2 after 1000 iterations.

Fig. 8. The distribution of clusters of Peer 1 after 1000 iterations.

L.H. Son / Expert Systems with Applications 42 (2015) 51–66 61

0:446857 0:053748 0:1914230:015702 0:107295 0:0600530:031231 0:055057 0:013822. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0:479442 0:081057 0:0303240:284316 0:276429 0:030259

ð62Þ

The distributions of clusters of the first, second and third peer in thesecond iteration are depicted in Figs. 5–7, respectively.

By the similar process, we also calculate the new positive, theneutral and the refusal matrices of other peers. These values areused to validate the stopping condition as in Step 10S of the DPFCM

algorithm. In this case, the value of the left side of the stoppingcondition is 0.539 which is larger than e = 0.01 so that we continueto make other iteration steps. The final positive, the neutral, therefusal matrices, the cluster centers and the attribute-weights ofthe first peer after 1000 iterations are shown in Eqs. (63)–(67),respectively.

0:060858 0:055820 0:0276310:008723 0:035924 0:0133260:012103 0:043641 0:029784. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0:052452 0:009393 0:0407760:005385 0:023104 0:036309

ð63Þ

Page 12: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

Table 2The comparison of clustering quality of algorithms.

Dataset DPFCM FCM WEFCM PFCM Soft-DKM CDFCM

ACR (%)IRIS 96.04 89.33 96.66 89.33 87.38 95.90GLASS 53.33 42.08 54.39 42.08 40.50 52.96IONOSPHERE 75.26 70.94 76.58 70.94 67.77 75.26HABERMAN 76.50 51.96 77.12 51.96 51.42 74.68HEART 71.89 51.31 72.88 51.31 50.24 71.95

ANMIDPFCM FCM WEFCM PFCM Soft-DKM CDFCM

IRIS 0.8785 0.7433 0.8801 0.7433 0.7294 0.8705GLASS 0.4175 0.2974 0.4263 0.2974 0.2848 0.4170IONOSPHERE 0.1961 0.1299 0.2026 0.1299 0.1028 0.1961HABERMAN 0.0826 0.0024 0.0992 0.0024 0.0018 0.0610HEART 0.0395 0.0052 0.0445 0.0052 0.0028 0.0408

Bold values emphasize the results of the proposed method.

Fig. 10. The distribution of clusters of Peer 3 after 1000 iterations.

62 L.H. Son / Expert Systems with Applications 42 (2015) 51–66

0:546285 0:674677 0:2106640:038853 0:796979 0:2620330:010000 0:775321 0:328807. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0:715918 0:010000 0:4044840:080086 0:416115 0:868126

ð64Þ

0:344102 0:248388 0:4997190:341086 0:159465 0:4987750:253134 0:172007 0:480467. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0:216394 0:239729 0:4440360:413766 0:447034 0:093166

ð65Þ

1:000000 1:000000 1:000000 1:0000000:253521 0:562804 1:000000 0:1189870:586810 0:385400 0:702494 0:680071

ð66Þ

0:356 0:423 0:191 0:0300:336 0:504 0:128 0:0320:372 0:462 0:125 0:040

ð67Þ

The distributions of clusters of the first, second and third peer after1000 iterations are depicted in Figs. 8–10, respectively.

Using the iteration scheme in DPFCM, the results of a Slave peerare balanced with those of the others and converge to optimalsolutions.

4.3. The comparison of clustering quality

Secondly, we compare the clustering quality of algorithmsthrough the ACR and ANMI indices. The number of peers used inthis section is 3.

The results in Table 2 show that the clustering quality of DPFCMis mostly better than those of three distributed clustering algo-rithms namely CDFCM, Soft-DKM and PFCM. It is also better thanthe traditional centralized clustering algorithm FCM, and is littlesmaller than the centralized weighted clustering WEFCM. Forexample, the ARC value of DPFCM for the IRIS dataset is 96.04%

whilst those of CDFCM, Soft-DKM, PFCM and FCM are 95.90%,87.38%, 89.33% and 89.33%, respectively. It is smaller than that ofWEFCM (96.66%), but the difference of results between DPFCMand WEFCM is quite small. Looking inside the ANMI results of allalgorithms, we could recognize that the ANMI value of DPFCM isalso larger than those of CDFCM, Soft-DKM, PFCM and FCM, andis smaller than that of WEFCM with the numbers being 0.8785,0.8705, 0.7294, 0.7433, 0.7433 and 0.8801, respectively. Similarobservations of both ACR and ANMI indices are found for theGLASS and HABERMAN data. Nevertheless, there are some casesthat DPFCM results in lower clustering quality than CDFCM. Forexample, the ACR value of DPFCM for the IONOSPHERE dataset is75.26% which is equal to that of DPFCM, and both of them aresmaller than that of WEFCM. For the HEART dataset, the ACR valueof DPFCM is 71.89%, smaller than those of CDFCM (71.95%) andWEFCM (72.88%). Analogous remarks are found with the ANMIindex. This means that using the update of all peers in the facilita-tor model does not always result in better clustering quality thanusing the update of some neighboring peers in the mechanism ofCDFCM since some peers could be in the bad-initialization-statesso that the final results would be affected by the balancing mech-anism between peers. Nonetheless, these cases are not much andmost of the time DPFCM often has better clustering quality thanCDFCM and other relevant algorithms. The clustering qualities ofalgorithms also vary depending on the dataset. For instance, the

Page 13: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

Fig. 11. The ACR values of algorithms.

Fig. 12. The ANMI values of algorithms.

Fig. 13. The ACR values of DPFCM by number of peers.

L.H. Son / Expert Systems with Applications 42 (2015) 51–66 63

Page 14: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

Table 3The results of DPFCM by various numbers of peers.

Dataset P = 2 P = 3 P = 4 P = 5 P = 6 P = 7

ACR (%)IRIS 93.55 96.04 96.08 98.21a 97.48 90.54GLASS 49.23 53.33 57.62 76.20a 56.03 30.50IONOSPHERE 65.75 75.26a 64.30 63.54 61.32 59.72HABERMAN 70.95 76.50 78.26 82.12 91.32a 70.95HEART 69.70 71.89a 66.83 56.80 55.40 52.57

ANMIIRIS 0.8725 0.8785 0.8923 0.9217a 0.9186 0.8492GLASS 0.4160 0.4175 0.4525 0.4862a 0.4480 0.3475IONOSPHERE 0.1893 0.1961a 0.1924 0.1853 0.1727 0.1695HABERMAN 0.0756 0.0826 0.0954 0.1084 0.1456a 0.0756HEART 0.0325 0.0395 0.0683 0.0866a 0.0432 0.0311

a Indicate the maximum value for this dataset.

Table 4The comparison of AIN values of algorithms.

Dataset DPFCM FCM WEFCM PFCM Soft-DKM CDFCM

IRIS 42.4 21.2 26.2 38.6 23.8 30.2GLASS 113.6 56.2 63.8 107.6 73.4 86.2IONOSPHERE 83.4 13.9 44.8 52.6 36.8 51.2HABERMAN 46.3 17.1 18.2 29.4 21.6 26.8HEART 81.2 41.0 48.4 67.4 46.2 56.8

Bold values emphasize the results of the proposed method.

Fig. 14. The ANMI values of DPFCM by number of peers.

64 L.H. Son / Expert Systems with Applications 42 (2015) 51–66

ACR values of algorithms for the IRIS dataset is absolute high,ranging from 87.33% to 96.66%. However the results for the GLASSdataset are medium with the best clustering quality tracked fromthe WEFCM algorithm being 54.39% only. In the other words, forevery two checked data points, one of them is mostly wronglabeled. The ranges of clustering qualities of algorithms for theIONOSPHERE, HABERMAN and HEART data are (67.77–76.58%),(51.42–77.12%) and (50.24–72.88%), respectively. The ranges forthe IRIS and IONOSPHERE are quite narrow which means that (i)all algorithms do not differ remarkably in terms of clustering qual-ity and tend to converge to the optimal results achieved by theWEFCM algorithm; (ii) Some datasets such as IONOSPHERE containoutliers so that the range of clustering quality is not high. Never-theless, this range is greater than 65% of accuracy and can beacceptable. For other datasets, the ranges of clustering qualitiesof algorithms are broad, and the efficiency of the proposed algo-rithm DPFCM is expressed more obvious than that in the cases ofnarrow ranges. For example, the clustering quality of DPFCM forthe HABERMAN dataset is nearly approximate to that of WEFCMand is much larger than those of FCM, PFCM and Soft-DKM. Thus,the efficiency of DPFCM was proven even in the noisy and nar-row-range datasets.

In Figs. 11 and 12, we illustrate the ACR and ANMI values ofalgorithms through various datasets. Obviously, the line of DPFCM

is higher than those of other algorithms except WEFCM. Thisaffirms our remarks above about the efficiency of DPFCM.

4.4. The impact of the number of peers

Thirdly, we measure the effect of the number of peers to theclustering quality of DPFCM. In Section 3.4, we state a questioninvolving the optimal number of peers to balance between theclustering quality and the computational time. In order to answerthis question, we have run the DPFCM algorithm by various num-bers of peers on the experimental datasets and measured the ACRand ANMI indices values of those cases. The results are demon-strated in Table 3. From this table, we depict the ACR and ANMIvalues of the DPFCM algorithm by number of peers and describethem in Figs. 13 and 14.

The results in Table 3 and Figs. 13 and 14 clearly state that theoptimal range of the number of peers should be [3, 5]. It is obviousthat the ACR values of DPFCM on the IRIS and GLASS datasets aremaximal with P = 5. Similarly, the maximal ACR values of DPFCMon the IONOSPHERE and HEART datasets are achieved with P = 3.The last result on the HABERMAN dataset shows the maximalACR value with P = 6. By the simple count on the maximal ANMIvalues of DPFCM in Table 3, we also recognize that P = 5 is the mostsuitable number of peers since it contributes three cases of themaximal ANMI values. Thus, our recommendation for choosing Pis in the range [3, 5].

4.5. The comparison of computational time

Lastly, we investigate the computational time of all algorithmsthrough the AIN index (see Table 4). The results have clearly statedthat the proposed DPFCM takes longer number of iterations than

Page 15: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

L.H. Son / Expert Systems with Applications 42 (2015) 51–66 65

other algorithms. However, the differences between those algo-rithms are not much and this limitation of DPFCM can beacceptable.

5. Conclusions

In this paper, we concentrated on the fuzzy clustering indistributed environments and presented a novel distributed pic-ture fuzzy clustering method on picture fuzzy sets namely DPFCM.This algorithm employed the facilitator model to both amelioratethe clustering quality through large updated numbers of neighbor-ing results and reduce the communication costs. In all Slave peers,the clustering was oriented by the principle of picture fuzzy sets,which are the generalization of the traditional fuzzy sets and intui-tionistic fuzzy sets. By combining the ideas of the facilitator modeland picture fuzzy sets, DPFCM has advanced the clustering qualityof the relevant algorithms for this problem. Theoretical analyses ofthe proposed algorithm including the meanings of some theoremsproposed in this article and the advantages/disadvantages of thealgorithm were also discussed. The theoretical contribution of thispaper could be useful for later development and applications ofdistributed fuzzy clustering to practical problems.

Experimental results have been conducted on the benchmarkdatasets of UCI Machine Learning Repository and divided into sev-eral different scenarios for purposes. A numerical example on theIRIS dataset has been conducted to show step-by-step the activitiesof the proposed algorithm. The measurements on the impact of thenumber of peers and the computational time of algorithms werealso investigated. The findings extracted from the experimentscould be summarized as follow: (i) The clustering quality of DPFCMis better than those of other relevant distributed clustering algo-rithms; (ii) The average ACR value of DPFCM by various datasetsis 74.6%; (iii) The number of peers used in the DPFCM algorithmshould be chosen within the range [3, 5]; (iv) DPFCM takes longercomputational time than other algorithms, yet the differences arenot much and can be acceptable.

The insightful and practical implications of the proposedresearch work could be interpreted as follows. Firstly, since manyapplications nowadays require fast processing on the large andvery large datasets, the DPFCM algorithm could be used to simul-taneously process those data without remarkable deducting thequality of outputted results. Each local site is kept up-to-date withthe others and the main site so that the proposed mechanism couldbe efficient for the world wide management. Secondly, the theoret-ical contribution of this paper could expand a minor researchdirection about distributed fuzzy clustering on advanced fuzzy setssuch as the picture fuzzy sets used in this article.

From these insightful implications, further works of this themecould be lay into several directions: (i) Extending DPFCM in thecontext of semi-supervised clustering; (ii) Adapting DPFCM forother special parallel processing models such as OpenMP andLAMP/MPI; (iii) Integrating DPFCM to recommender systems forthe extraction of distributed fuzzy rules; (iv) Considering DPFCMas a part of a fuzzy time series forecasting system such as the ANFISnetwork, ANN, etc. (v) Applying this algorithm for some groupdecision making problems.

Acknowledgement

The authors are greatly indebted to the editor-in-chief, Prof. B.Lin and anonymous reviewers for their comments and their valu-able suggestions that improved the quality and clarity of paper.Another thank is sent to Msc. Pham Huy Thong for the calculationworks. This work is sponsored by a VNU Project under contract No.QG.14.60.

References

Agarwal, M., Agrawal, H., Jain, N., & Kumar, M. (2010). Face recognition usingprinciple component analysis, eigenface and neural network. In Proceeding ofIEEE international conference on signal acquisition and processing (ICSAP’10) (pp.310–314).

Ahmed, M. N., Yamany, S. M., Mohamed, N., Farag, A. A., & Moriarty, T. (2002). Amodified fuzzy c-means algorithm for bias field estimation and segmentation ofMRI data. IEEE Transactions on Medical Imaging, 21(3), 193–199.

Atanassov, K. T. (1986). Intuitionistic fuzzy sets. Fuzzy Sets and Systems, 20, 87–96.Bache, K., & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA:

University of California, School of Information and Computer Science [Online].Available from: <http://archive.ics.uci.edu/ml/>.

Bai, C., Dhavale, D., & Sarkis, J. (2014). Integrating fuzzy C-means and TOPSIS forperformance evaluation: An application and comparative analysis. ExpertSystems with Applications, 41(9), 4186–4196.

Balcan, M. F., Ehrlich, S., Liang, Y. (2013). Distributed k-means and k-medianclustering on general topologies. arXiv preprint arXiv:1306.0604v3.

Bezdek, J. C. et al. (1984). FCM: The fuzzy c-means clustering algorithm. Computers& Geosciences, 10, 191–203.

Branting, L. K. (2013). Distributed pivot clustering with arbitrary distance. InProceeding of 2013 IEEE international conference on big data (pp. 21–27).

Bui, A., Kudireti, A., & Sohier, D. (2012). An adaptive random walk based distributedclustering algorithm. International Journal of Foundations of Computer Science,23(04), 803–830.

Cao, H., Deng, H. W., & Wang, Y. P. (2012). Segmentation of M-FISH images forimproved classification of chromosomes with an adaptive fuzzy C-meansclustering algorithm. IEEE Transactions on Fuzzy Systems, 20(1), 1–8.

Chen, L., Chen, C. P., & Lu, M. (2011). A multiple-kernel fuzzy c-means algorithm forimage segmentation. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 41(5), 1263–1274.

Chen, X. W., & Huang, T. (2003). Facial expression recognition: A clustering-basedapproach. Pattern Recognition Letters, 24(9), 1295–1302.

Chimphlee, W., Abdullah, A. H., Noor Md Sap, M., Chimphlee, S., & Srinoy, S. (2005).Integrating genetic algorithms and fuzzy c-means for anomaly detection. InProceeding of IEEE indicon (pp. 575–579).

Chimphlee, W., Abdullah, A. H., Noor Md Sap, M., Srinoy, S., & Chimphlee, S. (2006).Anomaly-based intrusion detection using fuzzy rough clustering. In Proceedingof IEEE international conference on hybrid information technology (ICHIT’06) (Vol.1, pp. 329–334).

Chuang, K. S., Tzeng, H. L., Chen, S., Wu, J., & Chen, T. J. (2006). Fuzzy c-meansclustering with spatial information for image segmentation. ComputerizedMedical Imaging and Graphics, 30(1), 9–15.

Chu, H. J., Liau, C. J., Lin, C. H., & Su, B. S. (2012). Integration of fuzzy cluster analysisand kernel density estimation for tracking typhoon trajectories in the Taiwanregion. Expert Systems with Applications, 39(10), 9451–9457.

Coletta, L. F., Vendramin, L., Hruschka, E. R., Campello, R. J., & Pedrycz, W. (2012).Collaborative fuzzy clustering algorithms: Some refinements and designguidelines. IEEE Transactions on Fuzzy Systems, 20(3), 444–462.

Cuong, B. C., & Kreinovich, V. (2013). Picture fuzzy sets – a new concept forcomputational intelligence problems. In Proceeding of 2013 third world congresson information and communication technologies (WICT 2013) (pp. 1–6).

Cuong, B. C., Son, L. H., & Chau, H. T. M. (2010). Some context fuzzy clusteringmethods for classification problems. In Proceedings of the 2010 ACM symposiumon information and communication technology (pp. 34–40).

Di Martino, F., Loia, V., & Sessa, S. (2008). Extended fuzzy C-means clusteringalgorithm for hotspot events in spatial analysis. International Journal of HybridIntelligent Systems, 5(1), 31–44.

Egrioglu, E. et al. (2011). Fuzzy time series forecasting method based on Gustafson–Kessel fuzzy clustering. Expert Systems with Applications, 38(8), 10355–10357.

Egrioglu, E., Aladag, C. H., & Yolcu, U. (2013). Fuzzy time series forecasting with anovel hybrid approach combining fuzzy c-means and neural networks. ExpertSystems with Applications, 40(3), 854–857.

Ercan, T. (2010). Effective use of cloud computing in educational institutions.Procedia-Social and Behavioral Sciences, 2(2), 938–942.

Forero, P. A., Cano, A., & Giannakis, G. B. (2011). Distributed clustering usingwireless sensor networks. IEEE Journal of Selected Topics in Signal Processing, 5(4),707–724.

Gehweiler, J., & Meyerhenke, H. (2010). A distributed diffusive heuristic forclustering a virtual P2P supercomputer. In Proceeding of 2010 IEEEinternational symposium on parallel & distributed processing, workshops and phdforum (IPDPSW) (pp. 1–8).

Geng, X., & Yang, Z. (2013). Data mining in cloud computing. In Proceeding of 2013international conference on information science and computer applications (ISCA2013).

Ghanem, S., Kechadi, T., & Tari, A. (2011). New approach for distributed clustering.In Proceeding of 2011 IEEE international conference on spatial data mining andgeographical knowledge services (ICSDM) (pp. 60–65).

Gilhotra, E., & Trikha, P. (2012). Modification in ‘‘KNN’’ clustering algorithm fordistributed data. International Journal of Computer Applications in EngineeringSciences, 22(3).

Hadavandi, E., Shavandi, H., & Ghanbari, A. (2011). An improved sales forecastingapproach by the integration of genetic fuzzy systems and data clustering: Casestudy of printed circuit board. Expert Systems with Applications, 38(8), 9392–9399.

Page 16: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets

66 L.H. Son / Expert Systems with Applications 42 (2015) 51–66

Haddadnia, J., Faez, K., & Ahmadi, M. (2003). A fuzzy hybrid learning algorithm forradial basis function neural network with application in human facerecognition. Pattern Recognition, 36(5), 1187–1202.

Hai, M., Zhang, S., Zhu, L., & Wang, Y. (2012). A survey of distributed clusteringalgorithms. In Proceeding of 2012 IEEE international conference on industrialcontrol and electronics engineering (ICICEE) (pp. 1142–1145).

Huang, H. C., Chuang, Y. Y., & Chen, C. S. (2012). Multiple kernel fuzzy clustering.IEEE Transactions on Fuzzy Systems, 20(1), 120–134.

Izakian, H., & Abraham, A. (2011). Fuzzy C-means and fuzzy swarm for fuzzyclustering problem. Expert Systems with Applications, 38(3), 1835–1838.

Jain, A. K., & Maheswari, S. (2013). Survey of recent clustering techniques in datamining. Journal of Current Computer Science and Technology, 3(01).

Karjee, J., & Jamadagni, H. S. (2011). Data accuracy model for distributed clusteringalgorithm based on spatial data correlation in wireless sensor networks. arXivpreprint arXiv:1108.2644.

Le Khac, N. A., & Kechadi, M. (2011). On a distributed approach for density-basedclustering. In Proceeding of 2011 10th international conference on machinelearning and applications and workshops (ICMLA) (Vol. 1, pp. 283–286).

Krinidis, S., & Chatzis, V. (2010). A robust fuzzy local information C-meansclustering algorithm. IEEE Transactions on Image Processing, 19(5), 1328–1337.

Kwon, Y., Nunley, D., Gardner, J. P., Balazinska, M., Howe, B., & Loebman, S. (2010).Scalable clustering algorithm for N-body simulations in a shared-nothing cluster.Berlin, Heidelberg: Springer, pp. 132–150.

Li, X. (2003). Gesture recognition based on fuzzy C-Means clustering algorithm.University Of Tennessee Knoxville: Department Of Computer Science.

Li, B. N., Chui, C. K., Chang, S., & Ong, S. H. (2011). Integrating spatial fuzzy clusteringwith level set methods for automated medical image segmentation. Computersin Biology and Medicine, 41(1), 1–10.

Li, H., Li, J., & Kang, F. (2011). Risk analysis of dam based on artificial bee colonyalgorithm with fuzzy c-means clustering. Canadian Journal of Civil Engineering,38(5), 483–492.

Lu, L., Gu, Y., & Grossman, R. (2010). dSimpleGraph: A novel distributed clusteringalgorithm for exploring very large scale unknown data sets. In Proceeding of2010 IEEE international conference on data mining workshops (ICDMW) (pp. 162–169).

Lu, J., Yuan, X., & Yahagi, T. (2006). A method of face recognition based on fuzzyclustering and parallel neural networks. Signal Processing, 86(8), 2026–2039.

Lu, J., Yuan, X., & Yahagi, T. (2007). A method of face recognition based on fuzzy c-means clustering and associated sub-NNs. IEEE Transactions on Neural Networks,18(1), 150–160.

Martin, A., Gayathri, V., Saranya, G., Gayathri, P., & Venkatesan, P. (2011). A hybridmodel for bankruptcy prediction using genetic algorithm, fuzzy c-means and mars.arXiv preprint arXiv:1103.2110.

Ma, L., & Staunton, R. C. (2007). A modified fuzzy C-means image segmentationalgorithm for use with uneven illumination patterns. Pattern Recognition,40(11), 3005–3011.

Pandey, S., Wu, L., Guru, S. M., & Buyya, R. (2010). A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computingenvironments. In Proceeding of 2010 24th IEEE international conference onadvanced information networking and applications (AINA) (pp. 400–407).

Petre, R. S�. (2012). Data mining in cloud computing. Database Systems Journal, 3(3),67–71.

Pham, D. L., Xu, C., & Prince, J. L. (2000). Current methods in medical imagesegmentation 1. Annual Review of Biomedical Engineering, 2(1), 315–337.

Rahimi, S., Zargham, M., Thakre, A., & Chhillar, D. (2004). A parallel fuzzy c-meanalgorithm for image segmentation. In Proceeding of IEEE annual meeting of thefuzzy information processing (NAFIPS’04) (Vol. 1, pp. 234–237).

Roh, S. B., Pedrycz, W., & Ahn, T. C. (2014). A design of granular fuzzy classifier.Expert Systems with Applications, 41(15), 6786–6795.

Shah, H., Undercoffer, J., & Joshi, A. (2003). Fuzzy clustering for intrusion detection.In Proceeding of 12th IEEE international conference on fuzzy systems (FUZZ’03)(Vol. 2, pp. 1274–1278).

Siang Tan, K., & Mat Isa, N. A. (2011). Color image segmentation using histogramthresholding–Fuzzy C-means hybrid approach. Pattern Recognition, 44(1), 1–15.

Singh, D., & Gosain, A. (2013). A comparative analysis of distributed clusteringalgorithms: A survey. In Proceeding of 2013 IEEE international symposium oncomputational and business intelligence (ISCBI) (pp. 165–169).

Son, L. H. (2014a). Enhancing clustering quality of geo-demographic analysis usingcontext fuzzy clustering type-2 and particle swarm optimization. Applied SoftComputing, 22, 566–584.

Son, L. H. (2014b). HU-FCF: A hybrid user-based fuzzy collaborative filteringmethod in recommender systems. Expert Systems With Applications, 41(15),6861–6870.

Son, L. H., Cuong, B. C., Lanzi, P. L., & Thong, N. T. (2012). A novel intuitionistic fuzzyclustering method for geo-demographic analysis. Expert Systems withApplications, 39(10), 9848–9859.

Son, L. H., Cuong, B. C., & Long, H. V. (2013). Spatial interaction–modification modeland applications to geo-demographic analysis. Knowledge-Based Systems, 49,152–170.

Son, L. H., Lanzi, P. L., Cuong, B. C., & Hung, H. A. (2012). Data mining in GIS: A novelcontext-based fuzzy geographically weighted clustering algorithm.International Journal of Machine Learning and Computing, 2(3), 235–238.

Son, L. H., Linh, N. D., & Long, H. V. (2014). A lossless DEM compression for fastretrieval method using fuzzy clustering and MANFIS neural network.Engineering Applications of Artificial Intelligence, 29, 33–42.

Srikantaiah, S., Kansal, A., & Zhao, F. (2008). Energy aware consolidation for cloudcomputing. In Proceedings of the 2008 conference on power aware computing andsystems (Vol. 10).

Surcel, T., & Alecu, F. (2008). Applications of cloud computing. In Proceeding of theinternational conference of science and technology in the context of the sustainabledevelopment (pp. 177–180).

Vendramin, L., Campello, R. J. G. B., Coletta, L. F., & Hruschka, E. R. (2011).Distributed fuzzy clustering with automatic detection of the number of clusters.In Proceeding of international symposium on distributed computing and artificialintelligence (pp. 133–140).

Visalakshi, N. K., Thangavel, K., & Parvathi, R. (2010). An intuitionistic fuzzyapproach to distributed fuzzy clustering. International Journal of ComputerTheory and Engineering, 2(2), 1793–8201.

Wachs, J., Stern, H., & Edan, Y. (2003). Parameter search for an image processingfuzzy C-means hand gesture recognition system. In Proceedings of 2003 IEEEinternational conference on image processing (ICIP 2003) (Vol. 3, pp. III-341).

Wang, G., Hao, J., Ma, J., & Huang, L. (2010). A new approach to intrusion detectionusing artificial neural networks and fuzzy clustering. Expert Systems withApplications, 37(9), 6225–6232.

Wang, Y., Ma, X., Lao, Y., & Wang, Y. (2014). A fuzzy-based customer clusteringapproach with hierarchical structure for logistics network optimization. ExpertSystems with Applications, 41(2), 521–534.

Xie, T., Bai, G., & Lang, H. (2010). A novel distributed clustering algorithm based onOCSVM. In Proceeding of 2010 IEEE international conference on intelligentcomputing and intelligent systems (ICIS) (Vol. 1, pp. 661–665).

Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.Zhang, D. Q., & Chen, S. C. (2004). A novel kernelized fuzzy c-means algorithm with

application in medical image segmentation. Artificial Intelligence in Medicine,32(1), 37–50.

Zhang, Y., Huang, D., Ji, M., & Xie, F. (2011). Image segmentation using PSO and PCMwith Mahalanobis distance. Expert Systems with Applications, 38(7), 9036–9040.

Zhou, J., & Philip Chen, C. L. (2011). Attribute weighted entropy regularization infuzzy c-means algorithm for feature selection. In Proceeding of IEEE internationalconference on system science and engineering, (pp. 59–64).

Zhou, J., Chen, C., Chen, L., & Li, H. (2013). A collaborative fuzzy clustering algorithmin distributed network environments. IEEE Transactions on Fuzzy Systems. http://dx.doi.org/10.1109/TFUZZ.2013.2294205.