research article a multidimensional and multimembership...

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2013 Article ID 323750 8 pageshttpdxdoiorg1011552013323750

Research ArticleA Multidimensional and Multimembership Clustering Methodfor Social Networks and Its Application in CustomerRelationship Management

Peixin Zhao1 Cun-Quan Zhang2 Di Wan3 and Xin Zhang4

1 School of Management Shandong University Jinan Shandong 250100 China2Department of Mathematics West Virginia University Morgantown WV 26506 USA3Department of Physics and Astronomy University of Victoria Victoria BC Canada V8W 2Y24 Foundation Department Shandong College of Electronic Technology Jinan Shandong 250200 China

Correspondence should be addressed to Peixin Zhao pxzhao126com

Received 15 July 2013 Accepted 7 August 2013

Academic Editor Yoshinori Hayafuji

Copyright copy 2013 Peixin Zhao et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Community detection in social networks plays an important role in cluster analysis Many traditional techniques for one-dimensional problems have been proven inadequate for high-dimensional or mixed type datasets due to the data sparseness andattribute redundancy In this paper we propose a graph-based clustering method for multidimensional datasetsThis novel methodhas two distinguished features nonbinary hierarchical tree and the multi-membership clusters The nonbinary hierarchical treeclearly highlightsmeaningful clusters while themultimembership featuremay providemore useful service strategies Experimentalresults on the customer relationship management confirm the effectiveness of the new method

1 Introduction

A social network is a set of people or groups each ofwhich has connections of some kind to some or all of theothers Although the general concept of social networksseems simple the underlying structure of a network impliesa set of characteristics which are typical to all complexsystems Social network plays an extremely important rolein many systems and processes and has been intensivelystudied over the past few years in order to understandboth local phenomena such as clique formation and theirdynamics and network-wide processes for example flow ofdata in computer networks [1] energy flow in food webs[2] customer relation management [3ndash6] and so forthModern information and communication technology hasoffered new interaction modes between individuals likemobile phone communications and online interactions Suchnew social exchanges can be accurately monitored for verylarge systems including millions of individuals representinga huge opportunity for the study of social science

Clustering analysis is a data mining technique developedfor the purpose of identifying groups of entities that aresimilar to each other with respect to certain similaritymeasures Many different clustering methods have beenproposed and used in a variety of fields Jain [7] broadlydivided these methods into two groups hierarchical clus-tering and partitioned clustering Hierarchical clusteringis the grouping of objects of interest according to theirsimilarity into a hierarchy with different levels reflecting thedegree of inter-object resemblance The most well-knownhierarchical methods are singlelink and completelink Insinglelink hierarchical methods the two clusters whose twoclosest members have the smallest distance are merged ineach step in completelink cases the two clusters whosemerger has the smallest diameter are merged in each stepCompared to hierarchical clustering methods partitionedclustering methods find all the clusters simultaneously asa partition of the data lK-means which is widely used forthe ease of implementation simplicity and efficiency wherea certain data point cannot be simultaneously included in

2 Mathematical Problems in Engineering

more than one cluster [8] Based on the difference of theircapabilities applicability and computational requirementsclustering methods can be categorized into several differentapproaches partitioning hierarchical density-based grid-based andmodel-basedNo particular clusteringmethod hasbeen shown to be superior to all its competitors in all aspects[9]

In recent years community detection based on clusteringhas become a growing research field partly as a result ofthe increasing availability of a huge number of networks inthe real world The most intuitive and common definitionof community structure is that such network seems tohave communities in them subsets of vertices within whichvertex-vertex connections are dense but between whichconnections are relatively sparse Yang and Luo [10] showthat community structure has close relationship with somefunctionality such as robustness and fast diffusion It is animportant network property and is able to reveal manyhidden features of the given network [11] The detection andanalysis of communities in social networks have played animportant role in the mining of different kinds of networksincluding the World Wide Web [12 13] communicationnetworks [14] and biological networks [15]

Most traditional community detection algorithms basedon clustering are limited to handling one-dimensionaldatasets [16 17] However the datasets to be mined in reallife often contain millions of objects described by manyvarious types of attributes or variables For example incustomer relation management a customer can be depictedby multidimensional data or mixed type data such as genderage income education level and so forth In such cases datamining operations and methods are required to be scalableas well as capable of dealing datasetsrsquo complex structuresand dimensions Previous researches were mainly focused onthe representation of a set of items with a single attributewhich is apparently unsuitable for the scenarios describedabove (i) a single attribute can not accurately representall the dimensions of items (ii) clustering according to asingle attribute often fails to capture the inherent dependencyamong multiple attributes and leads to meaningless cluster

Under such considerations in this paper we firstly intro-duce two pretreatment methods for multi-dimensional andmixed type data followed by a new clustering approach forcommunity detection in social networks In this approachindividuals and their relationships are denoted by weightedgraphs and the graph density we defined gives a betterquantity depict of the overall correlation among individualsin a community so that a reasonable clustering output canbe presented In particular our method produces ldquotreesrdquo ofsimple hierarchy and allows for fuzzy (overlapping) clusterswhich distinguishes it from other methods In order to verifythe utilityeffectiveness of ourmethod we did a (preliminary)evaluation against a mobile customer segmentation use caseThe numerical output of which shows supporting evidencefor further (improvement) application

The rest of the paper is organized as follows In Section 2we summarize the related works of community detectionsin social networks In Section 3 we introduce the details ofthe novel clustering approach for multiattribute data sets

As an application in customer relationship management thisapproach is used to analyze mobile customer segmentationproblem in Section 4 Finally a summary and conclusions aregiven in Section 5

2 Related Works

The detection for communities has brought about significantadvances to the understanding of many real-world complexnetworks Plenty of detection algorithms and techniqueshave been proposed drawing on methods and principlesfrom many different areas including physics artificial intel-ligence graph theory and even electrical circuits [11] Thespectral bisection methods [18] and the Kernighan-Lin [19]algorithm are early solutions to this problem in computersocietyThe spectral approach bisects graph iteratively whichis unsuitable to general networks For the Kernighan-Linalgorithm it requires a priori knowledge about the sizesof the initial divisions In 2002 Girvan and Newman [20]proposed a divisive hierarchical clustering algorithm referredto as GN which can generate optimizion of the division ofa network by iteratively cutting the edge with the greatestbetweenness value However a disadvantage of GN is thatits time complexity is 119874(1198982119899) on a network of 119899 nodes and119898 edges or 119874(1198983) on a sparse network then Newman [21]proposed a faster algorithm referred to as NM with timecomplexity119874(1198992) or119874((119898+119899)119899) on a sparse network A lot ofworks have been done to improve GN and NM for exampleRadicchi et al [22] proposed a similar algorithm with GN byusing the edge-clustering coefficient as a new metric with asmaller time complexity 119874(1198982) Clauset et al [23] have alsoproposed a fast clustering algorithm with 119874(119899 log2 119899) timecomplexity on sparse graph Especially in 2007Ou andZhang[24] proposed a new clustering method with the feature ofhierarchical tree and overlapping clusters the complexity ofthis method is119874(ℎ1198992 log 119899) where ℎ denotes the height of thehierarchical structure This method was respectively usedto cluster extremist web pages [25] and some classic socialnetworks [26] with single weighted edges

Random walk has also been successfully used in findingnetwork communities [27 28] The idea of this methodis that the walk tends to be trapped in dense parts of anetwork corresponding to communities Pons and Latapy[27] proposed a measure of similarity between vertices basedon random walks which has several important advantages itcaptures well the community structure in a network it can becomputed efficiently and it can be used in an agglomerativealgorithm to compute efficiently the community structure of anetworkThe algorithm calledWalktrap runs in time119874(1198981198992)and space119874(1198992) in the worst case and in time119874(1198992 log 119899) andspace119874(1198992) in most real-world cases Hu et al [29] proposedamethod for the identification of community structure basedon a signaling process of complex networks Each node istaken as the initial signal source to excite the whole networkone time and the source node is associated with an 119899-dimensional vector which records the effects of the signalingprocess By this process the topological relationship of nodeson the network could be transferred into a geometrical

Mathematical Problems in Engineering 3

structure of vectors in 119899-dimensional Euclidean space Thenthe best partition of groups is determined by F statisticsand the final community structure is given by the K-meansclustering method

Spectral clustering techniques have seen an explosivedevelopment and proliferation over the past few years [30ndash32] Previous work indicated that a robust approach tocommunity detection is the maximization of the benefitfunction known as ldquomodularityrdquo over possible divisions of anetwork but Newman and Girvan [30] showed that the max-imization process can be written in terms of the modularitymatrix which plays a role in community detection similarto that played by the graph Laplacian in graph partitioningcalculations and the time complexity of this algorithm is119874(1198992) They also proposed an objective function for graph

clustering called the 119876 function which allows for automaticselection of the number of clusters and then higher values ofthe119876 function were proven to correlate well with good graphclustering White and Smyth [31] showed how the119876 functioncan be reformulated as a spectral relaxation problem andproposed two new spectral clustering algorithms that seekto maximize 119876 Capocci et al [32] developed some spectral-based algorithm to reveal the structure of a complex networkwhich could be blurred by the bias artificially overimposed bythe iterative bisection constraint Such a method should beable to conjugate the power of spectral analysis to the cautionneeded to reveal an underlying structure when there is noclear cut partitioning as is often the case in real networks

Lots of other community detection algorithms have alsobeen proposed in the recent literatures For example Wuand Huberman [33] proposed a method which partitions anetwork into two communities where the network is viewedas an electric circuit and a battery is attached to two randomnodes that are supposed to be within two communities Shiet al [11] proposed a new genetic algorithm for communitydetection using the fundamentalmeasure criterionmodular-ity 119876 as the fitness function A special locus-based adjacencyencoding scheme is applied to represent the communitypartition Shi et al [34] proposed a novel method based onparticle swarm optimization to detect community structuresby optimizing network modularity

3 Multidimensional and MultimembershipClustering Method for Social Networks

31 Similarity of Multidimensional Data Traditional dis-tance functions include Euclidean distance Chebyshev dis-tance Manhattan distance Mahalanobis distance WeightedMinkowski distance and Cosine distance Among thesedistance functions Mahalanobis distance is based on corre-lations between variables by which different patterns can beidentified and analyzed It gauges similarity of an unknownsample set to a known one It differs from Euclidean distancein that it takes into account the correlations of the data setand is scale-invariant In otherwords it is amultivariate effectsize

All these distance functions have their own advantagesand disadvantages in practical applications Some research

results shows that Euclidean distance has better performancein vector models while some other numerical examples inhigh dimensional spaces show that the farthest and nearestdistance are almost equal although Euclidean distance isused to measure the similarity between data points That isin high-dimensional data traditional similarity measures asused in conventional clustering algorithms are usually notmeaningful This problem and related phenomena requireadaptations of clustering approaches to the nature of high-dimensional data This area of research has been a highlyactive one in recent years Common approaches are knownas for example subspace clustering projected clusteringpattern-based clustering or correlation clustering Subspaceclustering is the task of detecting all clusters in all subspaceswhich means that a point might be a member of multipleclusters each existing in a different subspace Subspaces caneither be axis parallel or affine Projected clustering seeksto assign each point to a unique cluster but clusters mayexist in different subspaces The general approach is to usea special distance function together with a regular clusteringalgorithm Correlation clustering provides a method forclustering a set of objects into the optimum number ofclusters without specifying that number in advance

In 2011 A new function ldquoClose()rdquo is presented basedon the improvement of traditional algorithm to compensatetheir inadequacy for high-dimensional space [35] Let

119883 = (1199091 1199092 119909

119899)

119884 = (1199101 1199102 119910

119899)

(1)

denote two points in 119899-dimensional space The functionldquoClose()rdquo is defined as

Close (119883 119884) =sum119899

119894=1119890minus|119909119894minus119910119894|

119899 (2)

It depicts the similarity degree between two data pointsand has the following properties

(a) Theminimum value of the function is 0 whichmeansthat the similarity degree between119883 and119884 is smallestsince the difference comes closest to infinity in eachdimension

(b) The maximum value of the function is 1 whichmeans that the similarity degree between 119883 and 119884 islargest since they come closest to coinciding in eachdimension

Similar to the weighted operator in traditional distancefunctions the close function can be corrected as

Close (119883 119884) =sum119899

119894=1120596119894119890minus|119909119894minus119910119894|

119899 (3)

where120596119894isin [0 1] denotes the importance degree of data in the

119894th dimension Advantages of the new function are obvious inhigh-dimensional similarity measurement according to thecomparison in [35]Quantitative analysis also proved that thisfunction can avoid the effects of noise and the curse of high-dimension


32 Similarity of Mixed Type Data For clustering mul-tiattributes datasets we first introduce a method for themeasurement of similarity between items as follows [36]Themultiattribute datasets can be separated into two parts thepure numeric datasets and pure categorical datasets Someexisting efficiency clustering methods designed for these twotypes of data sets are employed to produce correspondingclusters For the similarity matrix we define 119878(119894 119895) as thenumber of times the given sample pair 119909

119894and 119909

119895has co-

occurred in a cluster [37]Consider

119878 (119894 119895) = 119878 (119909119894 119909119895) =

1

119867

119867

sum

119896=1

120575 (120587119896(119909119894) 120587119896(119909119895))

120575 (119886 119887) equiv 1 119886 = 119887

0 119886 = 119887

(4)

where 119867 denotes the number of the clustering 120587119896(119909119894) and

120587119896(119909119895) denote the cluster label of items 119909

119894and 119909119895 respectively

Then for the pure numerical datasets the similarity can bedefined as

1198781(119894 119895) =

119899

119873=sum119873

119894=1119862 (119894 119895)

119873 (5)

where 119873 is the number of clustering and 119899 is the number oftimes the pattern pair (119909

119894 119909119895) is assigned to the same cluster

among the 119873 clustering If (119909119894 119909119895) is assigned to the same

cluster 119862(119894 119895) = 1 otherwise 119862(119894 119895) = 0For the pure categorical datasets the similarity can be

defined as

1198782(119894 119895) =

119899

119898=sum119898

119894=1119862 (119894 119895)

119898 (6)

where119898 denotes the number of attributesThen the similarityof multiattribute datasets can be denoted by

119878 = 1198781+ 1205721198782 (7)

where 120572 is a user-defined parameter If 120572 gt 1 the categoricaldatasets is more important than the numerical datasets if 120572 lt1 numerical datasets is more important 119878

1(119894 119895) and 119878

2(119894 119895)

can also be used as two-dimensional (or multidimensional)datasets to represent the similarities between items 119909

119894and 119909

119895

33 Multidimensional andMultimembership ClusteringMeth-od for Social Networks A graph or network is one of themostcommonly usedmodels to represent real-valued relationshipsof a set of input items Since many traditional techniquesfor one-dimensional problems have been proven inadequatefor high-dimensional or mixed type datasets due to thedata sparseness and attribute redundancy the graph-basedclustering method for single dimensional datasets proposedin [24ndash26] can be extended as follows to directly clustermultidimensional datasets

Let 119866 = (119881 119864) be a graph with the vertex set 119881 andassociated with 119903 weights

120596119896 119864 (119866) 997891997888rarr [0 1] 119896 = 1 2 119903 (8)

For a subgraph 119862(|119881(119862)| gt 1) of 119866 we define the 119896thdensity of 119862 by

119889119896(119862) =

2sum119890isin119864(119862)

120596119896(119890)

|119881(119862)| (|119881(119862)| minus 1) (9)

In single weighted graph 119862 if 120596(119890) = 1 and 119889(119862) = 1

for every edge 119890 in 119862 the subgraph 119862 induces a clique Fora multiweighted graph (119866 120596

1 1205962 120596

119903) a subgraph 119862 is

called a Δ-quasiclique if 119889119896(119862) ge Δ for some positive real

number Δ and for every 119896 isin 1 2 119903 (119903 is the numberof weights on the edge)

Clustering is a process that detects all dense subgraphs in119866 and constructs a hierarchically nested system to illustratetheir inclusion relation

A heuristic process is applied here for finding all qua-sicliques with density of various levels The core of thealgorithm is deciding whether or not to add a vertex to analready selected dense subgraph 119862 For a vertex V notin 119881(119862) wedefine the contribution of V to 119862 by

119888119896(V 119862) =

sum119906isin119881(119862)

120596119896(119906V)

|119881(119862)| (10)

A vertex V is added into 119862 if 119888119896(V 119862) gt 120572119889(119862) where 120572 is a

user specified parameterIn short themain steps of our algorithm can be described

as shown in Algorithm 1Trace the process of each vertex and obtain the hierarchic

treeOur detailed community detection algorithm that can

find Δ-quasicliques in 119866 with various levels of Δ is as followsA hierarchically nested system is constructed to illustratetheir inclusion relation

Step 0 119897 larr 1 where 119897 is the indicator of the levels in thehierarchical system

1198720larr997888 120574max 120596

119896(119890) forall119890 isin 119864 (119866) forall119896 (11)

where 120574 (0 lt 120574 lt 1) is a user specified parameter (120574 is a cut-off threshold)

Step 1 (the initial step) Let 119865 be the set of all edges 119890 of119866with

min 120596119896(119890) 119896 = 1 2 119903 ge 119872

0 (12)

Let 119898 = |119865| Sort the edges of the set 119865 as a sequence 119878 =1198901 119890

119898such that

119903

sum

119896=1

120596119896(1198901) ge

119903

sum

119896=1

120596119896(1198902) ge sdot sdot sdot ge

119903

sum

119896=1

120596119896(119890119898) (13)

120583 larr 1 119901 larr 0 and 119871119897larr 0 where 119871

119897is the community sets

in the 119897th hierarchical level

Step 2 (One has starting a new search)

119901 larr997888 119901 + 1 119862119901larr997888 119881(119890

120583) 119871

119897larr997888 119871

119897cup 119862119901

(14)


Input A graph 119866 = (119881 1205961 1205962 120596

119903) is a multi-weighted graph with 120596

119896 119864(119866) 997891rarr [0 1]

Output Meaningful community sets in 119866Algorithm Detect Δ-quasi-cliques in 119866 with various levels of Δ and construct a hierarchically nested system to illustrate theirinclusion relationWhile 119864 (119866) = 0

begindetermine the value of119872

0

Decompose(1198661198720)

1198640= 119890 isin 119864(119866) 120596

119896(119890) ge 119872

0 119896 = 1 2 119903

for each edge in 1198640in decreasing order of weights if the two vertexes of edge are not in any community create a new empty

community 119862 Choose V in the rest vertex sets that have maximum contribution to 119862 and add V in itMerging (119866)Merge two communities according to their common vertexesContract each community to a vertex and redefine the weight of the corresponding edgesStore the resulted graph to 119866End

Algorithm 1

Step 3 (growing)

Substep 31 119880 larr 119881(119866) minus 119881(119862119901) if 119880 = 0 go to Step 4

otherwise continue

(lowast) Pick V isin 119880 such that Π119903119896=1119888119896(V 119862119901) is a maximum

If for every 119896

119888119896(V 119862119901) ge 120572119899119889119896(119862119901) (15)

where 119899 = |119881(119862119901)| and 120572

119899= 1 minus (12)120582(119899 + 119905) with 120582 ge 1

119905 ge 1 as user specified parameters then 119862119901larr 119862119901cup V and

go back to Substep 31If Inequality (15) is not satisfied then

119880 larr997888 119880 minus V (16)

If 119880 = 0 repeat (lowast) If 119880 = 0 go to Substep 32

Substep 32 120583 larr 120583 + 1 If 120583 gt 119898 go to Step 4 otherwisecontinue

Substep 33 Suppose 119890120583= 119909119910 If at least one of 119909 119910 notin cup

119901minus1

119894=1

119881(119862119894) then go to Step 2 otherwise go to Substep 32

Step 4 (merging)

Substep 41 List all members of 119871119897as a sequence 119862

1 119862

119904

such that1003816100381610038161003816119881 (1198621)

1003816100381610038161003816 ge1003816100381610038161003816119881 (1198622)

1003816100381610038161003816 ge sdot sdot sdot ge1003816100381610038161003816119881 (119862119904)

1003816100381610038161003816 (17)

where 119904 = |119871119897| ℎ larr 2 119895 larr 1

Substep 42 If10038161003816100381610038161003816119862119895cap 119862ℎ

10038161003816100381610038161003816gt 120573min (10038161003816100381610038161003816119862119895

100381610038161003816100381610038161003816100381610038161003816119862ℎ

1003816100381610038161003816) (18)

(where 0 lt 120573 lt 1 is a user specified parameter) then 119862119904+1

larr

119862119895cup 119862ℎ and the sequence 119871

119897is rearranged as follows

1198621 119862

119904minus1larr deleting 119862

119895 119862ℎfrom 119862

1 119862

119904+1

119904 larr 119904 minus 1 ℎ larr maxℎ minus 2 1 and go to Substep 44

Substep 43 119895 larr 119895 + 1 If 119895 lt ℎ go to Substep 42

Substep 44 ℎ larr ℎ + 1 119895 larr 1 If ℎ le 119904 go to Substep 42

Step 5 Contract each 119862119901isin 119871119897as a vertex

119881 (119866) larr997888 [119881 (119866) minus

119904

⋃

119901=1

119881(119862119901)] cup 119862

1 119862

119904

120596119896(119906V) larr997888 120596

119896(119862119894 119862119895) =

sum119890isin119864119894119895

120596119896(119890)

119864119894119895

119896 = 1 2 119903

(19)

The vertex 119906 is obtained by contracting 119862119894and V is obtained

by contracting 119862119895where 119864

119894119895is the set of crossing edges which

is defined as

119864119894119895= 119909119910 119909 isin 119862

119894 119910 isin 119862

119895 119909 = 119910 (20)

For

119902 isin 119881 (119866) minus 1198621 119862

119904 (21)

define 120596119896(119902 119862119894) = 120596

119896(119902 119862

119894) Other cases are defined

similarlyIf |119881(119866)| ge 2 then go to Step 6 otherwise go to End

Step 6 One has

119897 larr997888 119897 + 1 119871119897larr997888 0

1205960larr997888 120574max 120596 (119890) forall119890 isin 119864 (119866) forall119896

(22)

where 120574 (0 lt 120574 lt 1) is a user specified parameter and go toStep 1 (to start a new search in a higher level of the hierarchicalsystem)

EndTrace the movement of each vertex and generate the

hierarchic tree


Table 1 Some information of 3000 mobile customers

Customernumber

Local callfee

(Yuan)

Longdistancecall fee(Yuan)

Roamingfee

(Yuan)

Textmessageand WAPfee (Yuan)

Packagetype

1 553 137 1206 142 D2 132 448 362 56 B3 471 2336 794 62 B4 173 193 875 193 C5 237 805 21 9 A6 623 629 778 106 E7 2425 218 235 242 A8 1662 345 8 195 Csdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

3000 776 67 212 247 D

If the input data is an unweighted graph 119866 the adjacencyinformation is used for establishing the similarity matrix of119866 Let 119860 = (119886

119894119895) be the adjacency matrix of 119866 where

119886119894119895=

1 there is an edge between 119894 and 1198950 otherwise

(23)

and the inner product of the 119894th and the 119895th row of 119860 is usedto describe the similarity between nodes 119894 and 119895 and stored as119866(119894 119895) in the similarity matrix 119866

4 Simulation Examples

In order to validate the feasibility of the proposed novelapproach to clustermulti-dimensional data sets we randomlytook 3000 customersrsquo consumption lists of August 2012 fromShandong Mobile Corporation and use our new approach todivide these customers into distinguishing clusters accordingto 4 evaluation indices local call fee long distance call feeroaming fee and text message andWAP feeThe original dataof 3000 customers are listed in Table 1

We have applied our approach to this problem and theresults of segmentation and their average consumption arelisted in Table 2 and Figure 1

As we can see from the clustering result the long distancefee of group 1 has a high proportion of their total expensesGroups 3 and 4 have high roaming fees Group 8 has lowercost in each index Groups 2 3 and 4 have higher textmessage and WAP fees Mobile corporations can initiatecorresponding policies according to the clustering results Forexample for the customers in Groups 2 3 and 4 mobilecorporation should provide them with some discount textmessage package for the customers in Groups 3 4 and 6some discount package of roaming will also help to increasecustomer loyalty and stability

On the other hand we noticed that the sum of thelast column of Table 2 is larger than 3000 This is becauseour method allows multimembership clustering thus somecustomers can belong to more than one group For instanceGroups 8 and 1 are low value customer and high value

Table 2 The customer segmentation of mobile network

Clusternumber

Averagelocal call

fee(Yuan)

Averagelong

distance callfee (Yuan)

Averageroaming

fee(Yuan)

Averagetext

messageand WAPfee (Yuan)

Numberof

customer

1 1569 1728 398 585 1212 2991 432 387 469 643 426 329 1747 362 1684 2128 1033 5743 397 135 1879 8715 353 287 96 1621 2623 3548 212 127 430 258 137 212 20778 192 75 48 135 792

1000

900

800

700

600

500

400

300

200

100

0

1 2 3 4 5 6 7 8

Aver

age c

onsu

mpt

ion

Local call feeLong distance call fee

Roaming feeText message and WAP fee

Figure 1 Average consumption list of 8 Groups

customer respectively and some special policies should berecommended for the 39 customers who belong to eitherGroup 1 or 8 to help them become loyal higher valuecustomers

5 Conclusions

In this paper a graph-based new clusteringmethod formulti-dimensional datasets is proposed Due to the inherent spar-sity of data points most existing clustering algorithms do notwork efficiently for multi-dimensional datasets and it is notfeasible to find interesting clusters in the original full space ofall dimensions These researches were mainly focused on therepresentation of a set of items with a single attribute whichcannot accurately represent all the attributes and capture theinherent dependency among multiple attributes The newclustering method we proposed in this paper overcomes


this problem by directly clustering items according to themultidimensional information Since it does not need datapreprocessing this new method may significantly improvesclustering efficiency It also has two-distinguished featuresnonbinary hierarchical tree and multimembership clustersThe application in customer relationship management hasproved the efficiency and feasibility of the new clusteringmethod

Conflict of Interests

Peixin Zhao Cun-Quan Zhang Di Wan and Xin Zhangcertify that there is no actual or potential conflict of interestsin relation to this paper

Acknowledgments

The first author is partially supported by the China Post-doctoral Science Foundation funded Project (2011M501149)the Humanity and Social Science Foundation of Min-istry of Education of China (12YJCZH303) the Spe-cial Fund Project for Postdoctoral Innovation of Shan-dong Province (201103061) the Informationization ResearchProject of Shandong Province (2013EI153) and IndependentInnovation Foundation of Shandong University IIFSDU(IFW12109) The second is author partially supported byan NSA Grant H98230-12-1-0233 and an NSF Grant DMS-1264800

References

[1] X Jin C M K Cheung M K O Lee and H Chen ldquoHow tokeep members using the information in a computer-supportedsocial networkrdquo Computers in Human Behavior vol 25 no 5pp 1172ndash1181 2009

[2] A Bodini ldquoThe qualitative analysis of community food websimplications for wildlife management and conservationrdquo Jour-nal of EnvironmentalManagement vol 41 no 1 pp 49ndash65 1994

[3] P C Verhoef and K N Lemon ldquoSuccessful customer valuemanagement key lessons and emerging trendsrdquo EuropeanManagement Journal vol 31 no 1 pp 1ndash15 2013

[4] C Kiss and M Bichler ldquoIdentification of influencersmdashmeasuring influence in customer networksrdquo Decision SupportSystems vol 46 no 1 pp 233ndash253 2008

[5] E S Bernardes and G A Zsidisin ldquoAn examination of strategicsupply management benefits and performance implicationsrdquoJournal of Purchasing and SupplyManagement vol 14 no 4 pp209ndash219 2008

[6] D Li W Dai and W Tseng ldquoA two-stage clustering methodto analyze customer characteristics to build discriminative cus-tomer management a case of textile manufacturing businessrdquoExpert Systems with Applications vol 38 no 6 pp 7186ndash71912011

[7] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[8] S Guha A Meyerson N Mishra R Motwani and LOrsquoCallaghan ldquoClustering data streams theory and practicerdquoIEEE Transactions on Knowledge and Data Engineering vol 15no 3 pp 515ndash528 2003

[9] F Cao ldquoAweightingK-modes algorithm for subspace clusteringof categorical datardquo Neurocomputing vol 108 pp 23ndash30 2012

[10] S Yang and S Luo ldquoA local quantitative measure for commu-nity detection in networksrdquo International Journal of IntelligentEngineering Informatics vol 1 no 1 pp 38ndash52 2010

[11] C Shi YWang BWu and C Zhong ldquoA new genetic algorithmfor community detectionrdquo in Complex Sciences Part II vol 5of Lecture Notes of the Institute for Computer Sciences SocialInformatics and Telecommunications Engineering pp 1298ndash1309 2009

[12] M Hoerdt and U Louis ldquoCompleteness of the internet coretopology collected by a fast mapping softwarerdquo in Proceedingsof the 11th International Conference on Software Telecommuni-cations and Computer Networks pp 257ndash261 2003

[13] A Broder P Kumar F Maghoul et al ldquoGraph structure in thewebrdquo in Proceedings of the 9th International Conference on theWorld Wide Web pp 15ndash19 2003

[14] J Scott Social Network Analysis A Handbook Sage 2000[15] D A Fell and A Wagner ldquoThe small world of metabolismrdquo

Nature Biotechnology vol 18 no 11 pp 1121ndash1122 2000[16] T Chen N L Zhang T Liu K M Poon and YWang ldquoModel-

basedmultidimensional clustering of categorical datardquoArtificialIntelligence vol 176 pp 2246ndash2269 2012

[17] L Poon N L Zhang T Liu and A H Liu ldquoModel-basedclustering of high-dimensional data variable selection versusfacet determinationrdquo International Journal of Approximate Rea-soning vol 54 no 1 pp 196ndash215 2012

[18] A Pothen H D Simon and K-P Liou ldquoPartitioning sparsematrices with eigenvectors of graphsrdquo SIAM Journal on MatrixAnalysis and Applications vol 11 no 3 pp 430ndash452 1990Sparse matrices (Gleneden Beach OR 1989)

[19] B W Kernighan and S Lin ldquoA efficient heuristic procedure forpartitioning graphsrdquo Bell System Technical Journal vol 49 pp291ndash307 1970

[20] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002

[21] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E vol 69 no 6 ArticleID 066133 pp 1ndash66133 2004

[22] F Radicchi C Castellano F Cecconi V Loreto and DParis ldquoDefining and identifying communities in networksrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 101 no 9 pp 2658ndash2663 2004

[23] A Clauset M E J Newman and C Moore ldquoFinding commu-nity structure in very large networksrdquo Physical Review E vol 70no 6 Article ID 066111 6 pages 2004

[24] Y Ou and C-Q Zhang ldquoA new multimembership clusteringmethodrdquo Journal of Industrial and Management Optimizationvol 3 no 4 pp 619ndash624 2007

[25] X Qi K Christensen R Duval et al ldquoA hierarchical algorithmfor clustering extremist web pagesrdquo in Proceedings of theInternational Conference onAdvances in Social NetworkAnalysisand Mining (ASONAM rsquo10) pp 458ndash463 August 2010

[26] P Zhao and C Zhang ldquoA new clustering method and itsapplication in social networksrdquo Pattern Recognition Letters vol32 no 15 pp 2109ndash2118 2011

[27] P Pons and M Latapy ldquoComputing communities in largenetworks using randomwalksrdquo Journal of GraphAlgorithms andApplications vol 10 no 2 pp 191ndash218 2006


[28] S V Dongen Graph clustering by flow simulation [PhDdissertation] University of Utrecht 2000

[29] Y HuM Li P Zhang Y Fan and Z Di ldquoCommunity detectionby signaling on complex networksrdquo Physical Review E vol 78no 1 Article ID 016115 2008

[30] M E J Newman and M Girvan ldquoFinding and evaluatingcommunity structure in networksrdquo Physical Review E vol 69no 2 Article ID 026113 2004

[31] S White and P Smyth ldquoA spectral clustering approach tofinding communities in graphsrdquo in Proceedings of SIAM Inter-national Conference on Data Mining pp 76ndash84 2005

[32] A Capocci V D P Servedio G Caldarelli and F ColaiorildquoDetecting communities in large networksrdquo Physica A vol 352no 2-4 pp 669ndash676 2005

[33] F Wu and B A Huberman ldquoFinding communities in lineartime a physics approachrdquo European Physical Journal B vol 38no 2 pp 331ndash338 2004

[34] Z Shi Y Liu and J Liang ldquoPSO-based community detectionin complex networksrdquo in Proceedings of the 2nd InternationalSymposium on Knowledge Acquisition andModeling (KAM rsquo09)pp 114ndash119 December 2009

[35] C Shao W Lou and L Yan ldquoOptimization of algorithm ofsimilarity measurement in high dimensional datardquo ComputerTechnology and Development vol 20 no 2 pp 1ndash4 2011

[36] H Luo and HWei ldquoClustering algorithm for mixed data basedon clustering ensemble techniquerdquo Computer Science vol 37no 11 pp 234ndash238 2010

[37] A Fred ldquoFinding consistent clusters in data partitionsrdquo inMul-tiple Classifier Systems vol 2096 of Lecture Notes in ComputerScience pp 309ndash318 2001

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of


Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of


Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of


CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of


Operations ResearchAdvances in

Journal of


Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Algebra

Discrete Dynamics in Nature and Society



Decision SciencesAdvances in

Discrete MathematicsJournal of


Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of


more than one cluster [8] Based on the difference of theircapabilities applicability and computational requirementsclustering methods can be categorized into several differentapproaches partitioning hierarchical density-based grid-based andmodel-basedNo particular clusteringmethod hasbeen shown to be superior to all its competitors in all aspects[9]

In recent years community detection based on clusteringhas become a growing research field partly as a result ofthe increasing availability of a huge number of networks inthe real world The most intuitive and common definitionof community structure is that such network seems tohave communities in them subsets of vertices within whichvertex-vertex connections are dense but between whichconnections are relatively sparse Yang and Luo [10] showthat community structure has close relationship with somefunctionality such as robustness and fast diffusion It is animportant network property and is able to reveal manyhidden features of the given network [11] The detection andanalysis of communities in social networks have played animportant role in the mining of different kinds of networksincluding the World Wide Web [12 13] communicationnetworks [14] and biological networks [15]

Most traditional community detection algorithms basedon clustering are limited to handling one-dimensionaldatasets [16 17] However the datasets to be mined in reallife often contain millions of objects described by manyvarious types of attributes or variables For example incustomer relation management a customer can be depictedby multidimensional data or mixed type data such as genderage income education level and so forth In such cases datamining operations and methods are required to be scalableas well as capable of dealing datasetsrsquo complex structuresand dimensions Previous researches were mainly focused onthe representation of a set of items with a single attributewhich is apparently unsuitable for the scenarios describedabove (i) a single attribute can not accurately representall the dimensions of items (ii) clustering according to asingle attribute often fails to capture the inherent dependencyamong multiple attributes and leads to meaningless cluster

Under such considerations in this paper we firstly intro-duce two pretreatment methods for multi-dimensional andmixed type data followed by a new clustering approach forcommunity detection in social networks In this approachindividuals and their relationships are denoted by weightedgraphs and the graph density we defined gives a betterquantity depict of the overall correlation among individualsin a community so that a reasonable clustering output canbe presented In particular our method produces ldquotreesrdquo ofsimple hierarchy and allows for fuzzy (overlapping) clusterswhich distinguishes it from other methods In order to verifythe utilityeffectiveness of ourmethod we did a (preliminary)evaluation against a mobile customer segmentation use caseThe numerical output of which shows supporting evidencefor further (improvement) application

The rest of the paper is organized as follows In Section 2we summarize the related works of community detectionsin social networks In Section 3 we introduce the details ofthe novel clustering approach for multiattribute data sets

As an application in customer relationship management thisapproach is used to analyze mobile customer segmentationproblem in Section 4 Finally a summary and conclusions aregiven in Section 5

2 Related Works

The detection for communities has brought about significantadvances to the understanding of many real-world complexnetworks Plenty of detection algorithms and techniqueshave been proposed drawing on methods and principlesfrom many different areas including physics artificial intel-ligence graph theory and even electrical circuits [11] Thespectral bisection methods [18] and the Kernighan-Lin [19]algorithm are early solutions to this problem in computersocietyThe spectral approach bisects graph iteratively whichis unsuitable to general networks For the Kernighan-Linalgorithm it requires a priori knowledge about the sizesof the initial divisions In 2002 Girvan and Newman [20]proposed a divisive hierarchical clustering algorithm referredto as GN which can generate optimizion of the division ofa network by iteratively cutting the edge with the greatestbetweenness value However a disadvantage of GN is thatits time complexity is 119874(1198982119899) on a network of 119899 nodes and119898 edges or 119874(1198983) on a sparse network then Newman [21]proposed a faster algorithm referred to as NM with timecomplexity119874(1198992) or119874((119898+119899)119899) on a sparse network A lot ofworks have been done to improve GN and NM for exampleRadicchi et al [22] proposed a similar algorithm with GN byusing the edge-clustering coefficient as a new metric with asmaller time complexity 119874(1198982) Clauset et al [23] have alsoproposed a fast clustering algorithm with 119874(119899 log2 119899) timecomplexity on sparse graph Especially in 2007Ou andZhang[24] proposed a new clustering method with the feature ofhierarchical tree and overlapping clusters the complexity ofthis method is119874(ℎ1198992 log 119899) where ℎ denotes the height of thehierarchical structure This method was respectively usedto cluster extremist web pages [25] and some classic socialnetworks [26] with single weighted edges

Random walk has also been successfully used in findingnetwork communities [27 28] The idea of this methodis that the walk tends to be trapped in dense parts of anetwork corresponding to communities Pons and Latapy[27] proposed a measure of similarity between vertices basedon random walks which has several important advantages itcaptures well the community structure in a network it can becomputed efficiently and it can be used in an agglomerativealgorithm to compute efficiently the community structure of anetworkThe algorithm calledWalktrap runs in time119874(1198981198992)and space119874(1198992) in the worst case and in time119874(1198992 log 119899) andspace119874(1198992) in most real-world cases Hu et al [29] proposedamethod for the identification of community structure basedon a signaling process of complex networks Each node istaken as the initial signal source to excite the whole networkone time and the source node is associated with an 119899-dimensional vector which records the effects of the signalingprocess By this process the topological relationship of nodeson the network could be transferred into a geometrical











119883 = (1199091 1199092 119909

119899)

119884 = (1199101 1199102 119910

119899)

(1)


Close (119883 119884) =sum119899

119894=1119890minus|119909119894minus119910119894|

119899 (2)





Close (119883 119884) =sum119899

119894=1120596119894119890minus|119909119894minus119910119894|

119899 (3)





119894and 119909

119895has co-


119878 (119894 119895) = 119878 (119909119894 119909119895) =

1

119867

119867

sum

119896=1

120575 (120587119896(119909119894) 120587119896(119909119895))

120575 (119886 119887) equiv 1 119886 = 119887

0 119886 = 119887

(4)





1198781(119894 119895) =

119899

119873=sum119873

119894=1119862 (119894 119895)

119873 (5)





defined as

1198782(119894 119895) =

119899

119898=sum119898

119894=1119862 (119894 119895)

119898 (6)


119878 = 1198781+ 1205721198782 (7)


1(119894 119895) and 119878

2(119894 119895)


119894and 119909

119895



120596119896 119864 (119866) 997891997888rarr [0 1] 119896 = 1 2 119903 (8)


119889119896(119862) =

2sum119890isin119864(119862)

120596119896(119890)

|119881(119862)| (|119881(119862)| minus 1) (9)



1 1205962 120596

119903) a subgraph 119862 is





119888119896(V 119862) =

sum119906isin119881(119862)

120596119896(119906V)

|119881(119862)| (10)







1198720larr997888 120574max 120596




min 120596119896(119890) 119896 = 1 2 119903 ge 119872

0 (12)


119898such that

119903

sum

119896=1

120596119896(1198901) ge

119903

sum

119896=1


119903

sum

119896=1

120596119896(119890119898) (13)





119901 larr997888 119901 + 1 119862119901larr997888 119881(119890

120583) 119871

119897larr997888 119871

119897cup 119862119901

(14)


Input A graph 119866 = (119881 1205961 1205962 120596


119896 119864(119866) 997891rarr [0 1]



0

Decompose(1198661198720)

1198640= 119890 isin 119864(119866) 120596

119896(119890) ge 119872

0 119896 = 1 2 119903



Algorithm 1

Step 3 (growing)


otherwise continue


If for every 119896

119888119896(V 119862119901) ge 120572119899119889119896(119862119901) (15)

where 119899 = |119881(119862119901)| and 120572

119899= 1 minus (12)120582(119899 + 119905) with 120582 ge 1



119880 larr997888 119880 minus V (16)




119901minus1

119894=1


Step 4 (merging)


1 119862

119904

such that1003816100381610038161003816119881 (1198621)

1003816100381610038161003816 ge1003816100381610038161003816119881 (1198622)


1003816100381610038161003816 (17)

where 119904 = |119871119897| ℎ larr 2 119895 larr 1

Substep 42 If10038161003816100381610038161003816119862119895cap 119862ℎ

10038161003816100381610038161003816gt 120573min (10038161003816100381610038161003816119862119895

100381610038161003816100381610038161003816100381610038161003816119862ℎ

1003816100381610038161003816) (18)


larr



1198621 119862


119895 119862ℎfrom 119862

1 119862

119904+1





119881 (119866) larr997888 [119881 (119866) minus

119904

⋃

119901=1

119881(119862119901)] cup 119862

1 119862

119904

120596119896(119906V) larr997888 120596

119896(119862119894 119862119895) =

sum119890isin119864119894119895

120596119896(119890)

119864119894119895

119896 = 1 2 119903

(19)




is defined as

119864119894119895= 119909119910 119909 isin 119862

119894 119910 isin 119862

119895 119909 = 119910 (20)

For

119902 isin 119881 (119866) minus 1198621 119862

119904 (21)

define 120596119896(119902 119862119894) = 120596

119896(119902 119862



Step 6 One has

119897 larr997888 119897 + 1 119871119897larr997888 0


(22)



hierarchic tree



Customernumber

Local callfee

(Yuan)


Roamingfee

(Yuan)


Packagetype


3000 776 67 212 247 D



119886119894119895=


(23)








Clusternumber

Averagelocal call

fee(Yuan)

Averagelong


Averageroaming

fee(Yuan)

Averagetext


Numberof

customer

1 1569 1728 398 585 1212 2991 432 387 469 643 426 329 1747 362 1684 2128 1033 5743 397 135 1879 8715 353 287 96 1621 2623 3548 212 127 430 258 137 212 20778 192 75 48 135 792

1000

900

800

700

600

500

400

300

200

100

0

1 2 3 4 5 6 7 8

Aver

age c

onsu

mpt

ion





5 Conclusions






Acknowledgments


References














































Volume 2014




Journal of











Journal of


Function Spaces






Algebra



















119883 = (1199091 1199092 119909

119899)

119884 = (1199101 1199102 119910

119899)

(1)


Close (119883 119884) =sum119899

119894=1119890minus|119909119894minus119910119894|

119899 (2)





Close (119883 119884) =sum119899

119894=1120596119894119890minus|119909119894minus119910119894|

119899 (3)





119894and 119909

119895has co-


119878 (119894 119895) = 119878 (119909119894 119909119895) =

1

119867

119867

sum

119896=1

120575 (120587119896(119909119894) 120587119896(119909119895))

120575 (119886 119887) equiv 1 119886 = 119887

0 119886 = 119887

(4)





1198781(119894 119895) =

119899

119873=sum119873

119894=1119862 (119894 119895)

119873 (5)





defined as

1198782(119894 119895) =

119899

119898=sum119898

119894=1119862 (119894 119895)

119898 (6)


119878 = 1198781+ 1205721198782 (7)


1(119894 119895) and 119878

2(119894 119895)


119894and 119909

119895



120596119896 119864 (119866) 997891997888rarr [0 1] 119896 = 1 2 119903 (8)


119889119896(119862) =

2sum119890isin119864(119862)

120596119896(119890)

|119881(119862)| (|119881(119862)| minus 1) (9)



1 1205962 120596

119903) a subgraph 119862 is





119888119896(V 119862) =

sum119906isin119881(119862)

120596119896(119906V)

|119881(119862)| (10)







1198720larr997888 120574max 120596




min 120596119896(119890) 119896 = 1 2 119903 ge 119872

0 (12)


119898such that

119903

sum

119896=1

120596119896(1198901) ge

119903

sum

119896=1


119903

sum

119896=1

120596119896(119890119898) (13)





119901 larr997888 119901 + 1 119862119901larr997888 119881(119890

120583) 119871

119897larr997888 119871

119897cup 119862119901

(14)


Input A graph 119866 = (119881 1205961 1205962 120596


119896 119864(119866) 997891rarr [0 1]



0

Decompose(1198661198720)

1198640= 119890 isin 119864(119866) 120596

119896(119890) ge 119872

0 119896 = 1 2 119903



Algorithm 1

Step 3 (growing)


otherwise continue


If for every 119896

119888119896(V 119862119901) ge 120572119899119889119896(119862119901) (15)

where 119899 = |119881(119862119901)| and 120572

119899= 1 minus (12)120582(119899 + 119905) with 120582 ge 1



119880 larr997888 119880 minus V (16)




119901minus1

119894=1


Step 4 (merging)


1 119862

119904

such that1003816100381610038161003816119881 (1198621)

1003816100381610038161003816 ge1003816100381610038161003816119881 (1198622)


1003816100381610038161003816 (17)

where 119904 = |119871119897| ℎ larr 2 119895 larr 1

Substep 42 If10038161003816100381610038161003816119862119895cap 119862ℎ

10038161003816100381610038161003816gt 120573min (10038161003816100381610038161003816119862119895

100381610038161003816100381610038161003816100381610038161003816119862ℎ

1003816100381610038161003816) (18)


larr



1198621 119862


119895 119862ℎfrom 119862

1 119862

119904+1





119881 (119866) larr997888 [119881 (119866) minus

119904

⋃

119901=1

119881(119862119901)] cup 119862

1 119862

119904

120596119896(119906V) larr997888 120596

119896(119862119894 119862119895) =

sum119890isin119864119894119895

120596119896(119890)

119864119894119895

119896 = 1 2 119903

(19)




is defined as

119864119894119895= 119909119910 119909 isin 119862

119894 119910 isin 119862

119895 119909 = 119910 (20)

For

119902 isin 119881 (119866) minus 1198621 119862

119904 (21)

define 120596119896(119902 119862119894) = 120596

119896(119902 119862



Step 6 One has

119897 larr997888 119897 + 1 119871119897larr997888 0


(22)



hierarchic tree



Customernumber

Local callfee

(Yuan)


Roamingfee

(Yuan)


Packagetype


3000 776 67 212 247 D



119886119894119895=


(23)








Clusternumber

Averagelocal call

fee(Yuan)

Averagelong


Averageroaming

fee(Yuan)

Averagetext


Numberof

customer

1 1569 1728 398 585 1212 2991 432 387 469 643 426 329 1747 362 1684 2128 1033 5743 397 135 1879 8715 353 287 96 1621 2623 3548 212 127 430 258 137 212 20778 192 75 48 135 792

1000

900

800

700

600

500

400

300

200

100

0

1 2 3 4 5 6 7 8

Aver

age c

onsu

mpt

ion





5 Conclusions






Acknowledgments


References














































Volume 2014




Journal of











Journal of


Function Spaces






Algebra











119894and 119909

119895has co-


119878 (119894 119895) = 119878 (119909119894 119909119895) =

1

119867

119867

sum

119896=1

120575 (120587119896(119909119894) 120587119896(119909119895))

120575 (119886 119887) equiv 1 119886 = 119887

0 119886 = 119887

(4)





1198781(119894 119895) =

119899

119873=sum119873

119894=1119862 (119894 119895)

119873 (5)





defined as

1198782(119894 119895) =

119899

119898=sum119898

119894=1119862 (119894 119895)

119898 (6)


119878 = 1198781+ 1205721198782 (7)


1(119894 119895) and 119878

2(119894 119895)


119894and 119909

119895



120596119896 119864 (119866) 997891997888rarr [0 1] 119896 = 1 2 119903 (8)


119889119896(119862) =

2sum119890isin119864(119862)

120596119896(119890)

|119881(119862)| (|119881(119862)| minus 1) (9)



1 1205962 120596

119903) a subgraph 119862 is





119888119896(V 119862) =

sum119906isin119881(119862)

120596119896(119906V)

|119881(119862)| (10)







1198720larr997888 120574max 120596




min 120596119896(119890) 119896 = 1 2 119903 ge 119872

0 (12)


119898such that

119903

sum

119896=1

120596119896(1198901) ge

119903

sum

119896=1


119903

sum

119896=1

120596119896(119890119898) (13)





119901 larr997888 119901 + 1 119862119901larr997888 119881(119890

120583) 119871

119897larr997888 119871

119897cup 119862119901

(14)


Input A graph 119866 = (119881 1205961 1205962 120596


119896 119864(119866) 997891rarr [0 1]



0

Decompose(1198661198720)

1198640= 119890 isin 119864(119866) 120596

119896(119890) ge 119872

0 119896 = 1 2 119903



Algorithm 1

Step 3 (growing)


otherwise continue


If for every 119896

119888119896(V 119862119901) ge 120572119899119889119896(119862119901) (15)

where 119899 = |119881(119862119901)| and 120572

119899= 1 minus (12)120582(119899 + 119905) with 120582 ge 1



119880 larr997888 119880 minus V (16)




119901minus1

119894=1


Step 4 (merging)


1 119862

119904

such that1003816100381610038161003816119881 (1198621)

1003816100381610038161003816 ge1003816100381610038161003816119881 (1198622)


1003816100381610038161003816 (17)

where 119904 = |119871119897| ℎ larr 2 119895 larr 1

Substep 42 If10038161003816100381610038161003816119862119895cap 119862ℎ

10038161003816100381610038161003816gt 120573min (10038161003816100381610038161003816119862119895

100381610038161003816100381610038161003816100381610038161003816119862ℎ

1003816100381610038161003816) (18)


larr



1198621 119862


119895 119862ℎfrom 119862

1 119862

119904+1





119881 (119866) larr997888 [119881 (119866) minus

119904

⋃

119901=1

119881(119862119901)] cup 119862

1 119862

119904

120596119896(119906V) larr997888 120596

119896(119862119894 119862119895) =

sum119890isin119864119894119895

120596119896(119890)

119864119894119895

119896 = 1 2 119903

(19)




is defined as

119864119894119895= 119909119910 119909 isin 119862

119894 119910 isin 119862

119895 119909 = 119910 (20)

For

119902 isin 119881 (119866) minus 1198621 119862

119904 (21)

define 120596119896(119902 119862119894) = 120596

119896(119902 119862



Step 6 One has

119897 larr997888 119897 + 1 119871119897larr997888 0


(22)



hierarchic tree



Customernumber

Local callfee

(Yuan)


Roamingfee

(Yuan)


Packagetype


3000 776 67 212 247 D



119886119894119895=


(23)








Clusternumber

Averagelocal call

fee(Yuan)

Averagelong


Averageroaming

fee(Yuan)

Averagetext


Numberof

customer

1 1569 1728 398 585 1212 2991 432 387 469 643 426 329 1747 362 1684 2128 1033 5743 397 135 1879 8715 353 287 96 1621 2623 3548 212 127 430 258 137 212 20778 192 75 48 135 792

1000

900

800

700

600

500

400

300

200

100

0

1 2 3 4 5 6 7 8

Aver

age c

onsu

mpt

ion





5 Conclusions






Acknowledgments


References














































Volume 2014




Journal of











Journal of


Function Spaces






Algebra










Input A graph 119866 = (119881 1205961 1205962 120596


119896 119864(119866) 997891rarr [0 1]



0

Decompose(1198661198720)

1198640= 119890 isin 119864(119866) 120596

119896(119890) ge 119872

0 119896 = 1 2 119903



Algorithm 1

Step 3 (growing)


otherwise continue


If for every 119896

119888119896(V 119862119901) ge 120572119899119889119896(119862119901) (15)

where 119899 = |119881(119862119901)| and 120572

119899= 1 minus (12)120582(119899 + 119905) with 120582 ge 1



119880 larr997888 119880 minus V (16)




119901minus1

119894=1


Step 4 (merging)


1 119862

119904

such that1003816100381610038161003816119881 (1198621)

1003816100381610038161003816 ge1003816100381610038161003816119881 (1198622)


1003816100381610038161003816 (17)

where 119904 = |119871119897| ℎ larr 2 119895 larr 1

Substep 42 If10038161003816100381610038161003816119862119895cap 119862ℎ

10038161003816100381610038161003816gt 120573min (10038161003816100381610038161003816119862119895

100381610038161003816100381610038161003816100381610038161003816119862ℎ

1003816100381610038161003816) (18)


larr



1198621 119862


119895 119862ℎfrom 119862

1 119862

119904+1





119881 (119866) larr997888 [119881 (119866) minus

119904

⋃

119901=1

119881(119862119901)] cup 119862

1 119862

119904

120596119896(119906V) larr997888 120596

119896(119862119894 119862119895) =

sum119890isin119864119894119895

120596119896(119890)

119864119894119895

119896 = 1 2 119903

(19)




is defined as

119864119894119895= 119909119910 119909 isin 119862

119894 119910 isin 119862

119895 119909 = 119910 (20)

For

119902 isin 119881 (119866) minus 1198621 119862

119904 (21)

define 120596119896(119902 119862119894) = 120596

119896(119902 119862



Step 6 One has

119897 larr997888 119897 + 1 119871119897larr997888 0


(22)



hierarchic tree



Customernumber

Local callfee

(Yuan)


Roamingfee

(Yuan)


Packagetype


3000 776 67 212 247 D



119886119894119895=


(23)








Clusternumber

Averagelocal call

fee(Yuan)

Averagelong


Averageroaming

fee(Yuan)

Averagetext


Numberof

customer

1 1569 1728 398 585 1212 2991 432 387 469 643 426 329 1747 362 1684 2128 1033 5743 397 135 1879 8715 353 287 96 1621 2623 3548 212 127 430 258 137 212 20778 192 75 48 135 792

1000

900

800

700

600

500

400

300

200

100

0

1 2 3 4 5 6 7 8

Aver

age c

onsu

mpt

ion





5 Conclusions






Acknowledgments


References














































Volume 2014




Journal of











Journal of


Function Spaces






Algebra











Customernumber

Local callfee

(Yuan)


Roamingfee

(Yuan)


Packagetype


3000 776 67 212 247 D



119886119894119895=


(23)








Clusternumber

Averagelocal call

fee(Yuan)

Averagelong


Averageroaming

fee(Yuan)

Averagetext


Numberof

customer

1 1569 1728 398 585 1212 2991 432 387 469 643 426 329 1747 362 1684 2128 1033 5743 397 135 1879 8715 353 287 96 1621 2623 3548 212 127 430 258 137 212 20778 192 75 48 135 792

1000

900

800

700

600

500

400

300

200

100

0

1 2 3 4 5 6 7 8

Aver

age c

onsu

mpt

ion





5 Conclusions






Acknowledgments


References














































Volume 2014




Journal of











Journal of


Function Spaces






Algebra













Acknowledgments


References














































Volume 2014




Journal of











Journal of


Function Spaces






Algebra



























Volume 2014




Journal of











Journal of


Function Spaces






Algebra









research article a multidimensional and multimembership...

Documents