fuzzy c-means clustering algorithm

8/8/2019 Fuzzy C-Means Clustering Algorithm

1/34

1

Unsupervised Optimal Fuzzy Clustering

Presented by

Asya Nikitina

I.Gath and A. B. Geva. IEEE Transactions on Pattern

Analysis and Machine Intelligence,1989,11(7),773-781


2/34

2

Fuzzy Sets and Membership Functions

You are approaching a red light and must advise a drivingstudent when to apply the brakes. What would you say:

Begin braking 74 feetfrom the crosswalk?

Apply the brakespretty soon?

Everyday language is one example ofthe ways vagueness is

used and propagated.

Imprecision in data and information gathered from and about

our environment is eitherstatistical(e.g., the outcome ofacoin toss is a matter ofchance) ornonstatistical(e.g., apply

the brakes pretty soon).

This latter type ofuncertainty is calledfuzziness.


3/34

3


We all assimilate and use fuzzy data, vague rules, and

imprecise information.

Accordingly, computational models ofreal systems shouldalso be able to recognize, represent, manipulate, interpret,

and use both fuzzy and statistical uncertainties.

Statistical models deal with random events and outcomes;

fuzzy models attempt to capture and quantify nonrandomimprecision.


4/34

4


Conventional (or crisp) sets contain objects that satisfy

preciseproperties required for membership. For example, the

set ofnumbersHfrom 6 to 8 is crisp:

H= {r | 6 r8}

mH = 1; 6 r 8;

mH = 0; otherwise (mHis a membership function)

Crisp sets correspond to 2-valued logic: is orisnt

on oroff

blackorwhite

1 or0


5/34

5


Fuzzy sets contain objects that satisfy impreciseproperties to

varying degrees, for example, the set ofnumbersFthat are

close to 7.

In the case offuzzy sets, the membership function, mF(r),maps numbers into the entire unit interval [0,1]. The value

mF(r)is called thegrade ofmembership ofrinF.

Fuzzy sets correspond to continuously-valued logic:

all shades ofgray between black(= 1) and white (= 0)


6/34

6


Because the property close to 7 is fuzzy, there is not a

unique membership function forF. Rather, it is left to the

modeler to decide, based on the potential application and

properties desired forF, what mF(r)should be like.

The membership function is the basic idea in fuzzy set

theory; its values measure degrees to which objects satisfy

imprecisely defined properties.

Fuzzy memberships represent similarities ofobjects toimprecisely defined properties.

Membership values determine how much fuzziness a fuzzy

set contains.


7/34

7

Fuzziness and Probability

Pr(B ) = 0.91m(A) = 0.91

L = {all liquids} = fuzzy subset ofL: = {all potable liquids}

A B

swapmwa

t r? b r? H2O? HCl?


8/34

8

Clustering is a mathematical tool that

attempts to discover structures or

certain patterns in a data set, where

the objects inside each cluster showa certain degree ofsimilarity.

Clustering


9/34

9

Hard clusteringassign each feature

vector to one and only one oftheclusters with a degree ofmembership

equal to one and well defined

boundaries between clusters.


10/34

10

Fuzzy clusteringallows each feature

vector to belong to more than one

cluster with different membership

degrees (between 0 and 1) and

vague orfuzzy boundaries between

clusters.


11/34

11

Difficulties with Fuzzy Clustering

The optimal number ofclusters Kto becreated has to be determined (the

number ofclusters cannot always be

defined a priori and a good clustervalidity criterion has to be found).

The character and location ofcluster

prototypes (centers) is not necessarily

known a priori, and initial guesses

have to be made.


12/34

12

Difficulties with Fuzzy Clustering

The data characterized by large

variabilities in cluster shape, cluster

density, and the number ofpoints(feature vectors) in different clusters

have to be handled.


13/34

13

Objectives and Challenges

Create an algorithm forfuzzy clustering thatpartitions the data set into an optimal number

ofclusters.

This algorithm should account for variability

in cluster shapes, cluster densities, and the

number ofdata points in each ofthe subsets.

Cluster prototypes would be generated

through a process ofunsupervised learning.


14/34

14

The Fuzzy k-Means Algorithm

Nthe number offeature vectorsKthe number ofclusters (partitions)

q weighting exponent (fuzzifier; q > 1)

uikthe ith membership functionon the kth vector ( uik: X p [0,1] )

kuik= 1; 0 < iuik< nVithe cluster prototype (the mean ofall

feature vectors in clusteri or thecenter ofclusteri)

Jq(U,V)the objective function


15/34

15

Partition a set offeature vectors Xinto Kclusters (subgroups) represented as

fuzzy setsF1,F2, ,FK

by minimizing the objective functionJq(U,V)

Jq(U,V)= ik(uik)qd2(Xj Vi);KeN

Larger membership values indicate higher

confidence in the assignment ofthe pattern to

the cluster.



16/34

16

Description of Fuzzy Partitioning

1) Choose primary cluster prototypes Vifor the values ofthe memberships

2) Compute the degree ofmembership of

all feature vectors in all clusters:

uij= [1/d2(Xj Vi)]

1/(q-1)/k[1/ d

2(Xj Vi)]1/(q-1) (1)

under the constraint: iuik= 1


17/34

17

Description of Fuzzy Partitioning

3) Compute new cluster prototypes Vi

Vi = j[(uij)qXj ] / j(uij)

q (2)

4) Iterate back and force between (1) and (2)

until the memberships or cluster centers

for successive iteration differ by more than

some prescribed value I (a termination

criterion)


18/34

18


Computation ofthe degree ofmembership uijdependson the definition ofthe distance measure, d2(Xj Vi):

d2(Xj

Vi

) = (Xj

Vi

)T7 -1(Xj

Vi

)

7= I=>The distance is Euclidian, the shape ofthe

clusters assumed to be hyperspherical

7 is arbitrary =>The shape ofthe clusters assumed

to be ofarbitrary shape


19/34

19


For the hyperellipsoidal clusters, an exponential

distance measure, d2e (Xj Vi), based on ML

estimation was defined:

d2e (Xj Vi) = [det(Fi)]1/2/Piexp[(Xj Vi)

TFi-1(Xj Vi)/2]

Fithe fuzzy covariance matrix ofthe ith cluster

Pi the a prioriprobability ofselecting ith cluster

h(i/Xj) = (1/d2e (Xj Vi))/k(1/d2e (Xj Vk))

h(i/Xj)the posterior probability (the probability of

selecting ith cluster given jth vector)


20/34

20


Its easy to see that forq = 2, h(i/Xj) = u

ijThus, substituting uij with h(i/Xj)results in the fuzzy

modification ofthe ML estimation (FMLE).

Addition calculations for the FMLE:


21/34

21

The Major Advantage of FMLE

Obtaining good partition results starting fromgood classification prototypes.

The first layer ofthe algorithm, unsupervised

tracking ofinitial centroids, is based on the fuzzyK-means algorithm.

The next phase, the optimal fuzzy partition, is

being carried out with the FMLE algorithm.


22/34

22

Unsupervised Tracking of Cluster

Prototypes Different choices ofclassification prototypes

may lead to different partitions.

Given a partition into kcluster prototypes, place

the next (k+1)th cluster center in a region where

data points have low degree ofmembership in the

existing kclusters.


23/34

23

Unsupervised Tracking of Cluster

Prototypes

1) Compute average and standard deviation ofthe

whole data set.

2) Choose the first initial cluster prototype at the

average location ofall feature vectors.3) Choose an additional classification prototype

equally distant from all data points.

4) Calculate a new partition ofthe data set

according to steps 1) and 2) ofthe fuzzyk-means algorithm.

1) Ifk, the number ofclusters, is less than a given

maximum, go to step 3, otherwise stop.


24/34

24

Common Fuzzy Cluster Validity

Each data point has Kmemberships; so, it is

desirable to summarize the information by a

single number, which indicates how well the

data point (Xk) is classified by clustering.

i(uik)2 partition coefficient

i(uik) loguik classification entropy

maxi uik proportional coefficient

The cluster validity is just the average ofany

ofthose functions over the entire data set.


25/34

25

Proposed Performance Measures

Good clusters are actually not very fuzzy.

The criteria for the definition ofoptimal

partition ofthe data into subgroups were

based on the following requirements:

1. Clear separation between the resulting

clusters2. Minimal volume ofthe clusters

3. Maximal number ofdata points concentrated

in the vicinity ofthe cluster centroid


26/34

26


Fuzzy hypervolume,FHV, is defined by:

WhereFi is given by:


27/34

27


Average partition density,D

PA, is calculated from:

Where Si, the sum ofthe central members, is given by:


28/34

28


The partition density,PD, is calculated from:


29/34

29

Sample Runs

In order to test the performance ofthe

algorithm, Nartificial m-dimensional

feature vectors from a multivariate normal

distribution having different parameters anddensities were generated.

Situations oflarge variability ofcluster

shapes, densities, and number ofdata points

in each cluster were simulated.


30/34

30

FCM Clustering with Varying Density

The higher density cluster attracts all other cluster prototypes

so that the prototype ofthe right cluster is slightly drawn away

from the original cluster center and the prototype ofthe left

cluster migrates completely into the dense cluster.


31/34

31


32/34

32

Fig. 3. Partition of12 clusters generated from five-

dimensional multivariate Gaussian distribution with

unequally variable features, variable densities and

variable number ofdata points ineach cluster (only threeofthe features are displayed).

(a) Data points before partitioning

(b) Partition of12 subgroups using the UFP-ONC algorithm.

All data points gave been classified correctly.

(a) (b)


33/34

33


34/34

34

Conclusions

The new algorithm, UFP-ONC(unsupervised fuzzy partition-optimal number

ofclasses), that combines the most favorable

features ofboth the fuzzy K-means algorithm

and the FMLE, together with unsupervised

tracking ofclassification prototypes, were

created.

The algorithm performs extremely well insituations oflarge variability ofcluster shapes,

densities, and number ofdata points in each

cluster .

fuzzy c-means clustering algorithm

Documents