cluster algorithms adriano joaquim de o cruz ©2006 ufrj [email protected]

88
Cluster Algorithms Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano @ nce . ufrj . br

Upload: isaac-wheeler

Post on 27-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

Cluster AlgorithmsCluster Algorithms

Adriano Joaquim de O Cruz ©2006

UFRJ

[email protected]

Page 2: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

K-means

Page 3: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 3

K-means algorithmK-means algorithm

Based on the Euclidean distances Based on the Euclidean distances among elements of the clusteramong elements of the cluster

Centre of the cluster is the mean value of Centre of the cluster is the mean value of the objects in the cluster.the objects in the cluster.

Classifies objects in a hard way. Each Classifies objects in a hard way. Each object belongs to a single cluster.object belongs to a single cluster.

Page 4: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 4

K-means algorithmK-means algorithm

Consider Consider n (X={xn (X={x11, x, x22, ..., x, ..., xnn})}) objects and objects and kk clusters. clusters.

Each object Each object xxii is defined by is defined by ll

characteristics characteristics xxii=(=(xxi1i1, x, xi2i2, ..., x, ..., x

ilil))..

Consider Consider AA a set of a set of kk clusters clusters ((AA={={AA11, A, A

22, ..., A, ..., Ak k }).}).

Page 5: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 5

K-means propertiesK-means properties

The union of all clusters makes the The union of all clusters makes the UniverseUniverse

No element belongs to more than one No element belongs to more than one clustercluster

There is no empty clusterThere is no empty cluster

iXA

AA

XA

i

ji

k

ii

1

Page 6: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 6

Membership functionMembership function

ie

ieeA Ax

Axx

i 01

)(

n

e

ex

n

eie

jeie

k

iie

k

ieAi

1

11

0

,0

,1)(

Page 7: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 7

Membership matrix Membership matrix UU

Matrix containing the values of inclusion of Matrix containing the values of inclusion of each element into each cluster (0 or 1).each element into each cluster (0 or 1).

Matrix has Matrix has cc (clusters) lines and (clusters) lines and nn (elements) (elements) columns.columns.

The sum of all elements in the column must be The sum of all elements in the column must be equal to one (element belongs only to one equal to one (element belongs only to one clustercluster

The sum of each line must be less than The sum of each line must be less than nn e e grater than 0. No empty cluster, or cluster grater than 0. No empty cluster, or cluster containing all elements.containing all elements.

Page 8: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 8

Matrix examples Matrix examples

X1 X2 X3

X4 X5 X6

Two examples of clustering. What do the clusters represent?

101010

0101011U

100100

010010

001001

2U

Page 9: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 9

Matrix examples cont. Matrix examples cont.

X1 X2 X3

X4 X5 X6

101010

0101011U

010101

1010102

U

U1 and U2 are the same matrices.

Page 10: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 10

How many clusters?How many clusters?

The cardinality of any hard k-partition of The cardinality of any hard k-partition of n elements is n elements is

nikk

i

iik

k)1(

!1

1

Page 11: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 11

How many clusters (example)?How many clusters (example)?

Consider the matrix U2 (k=3, n=6)Consider the matrix U2 (k=3, n=6)

90)3()1(3

3

)2()1(2

3

)1()1(1

3

!31

60

61

62

Page 12: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 12

K-means inputs and outputsK-means inputs and outputs

Inputs: the number of clusters Inputs: the number of clusters cc and a and a database containing database containing nn objects with objects with ll characteristics each.characteristics each.

Output: A set of Output: A set of kk clusters that clusters that minimises the square-error criterion.minimises the square-error criterion.

Page 13: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 13

Number of ClustersNumber of Clusters

Log of Number of Partitions

02468

101214

1 4 7 10 13 16

Number of Clusters

Lo

g o

f N

um

ber

of

Par

titi

on

s No. 5

No. 10

No. 15

No. 20

Page 14: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 14

K-means algorithm v1K-means algorithm v1

Arbitrarily assigns each object Arbitrarily assigns each object to a cluster (matrix to a cluster (matrix UU).).

RepeatRepeat Update the cluster centres;Update the cluster centres;

Reassign objects to the clusters Reassign objects to the clusters to which the objects are most to which the objects are most similar;similar;

UntilUntil no change; no change;

Page 15: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 15

K-means algorithm v2K-means algorithm v2

Arbitrarily choose Arbitrarily choose cc objects as objects as the initial cluster centres.the initial cluster centres.

RepeatRepeat Reassign objects to the clusters Reassign objects to the clusters to which the objects are most to which the objects are most similar.similar.

Update the cluster centres.Update the cluster centres.

Until no changeUntil no change

Page 16: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 16

Algorithm detailsAlgorithm details

The algorithm tries to minimise the The algorithm tries to minimise the functionfunction

ddieie is the distance between the element is the distance between the element xxee

((mm characteristics) and the centre of the characteristics) and the centre of the cluster cluster ii ( (vvii))

n

e

c

iieie dJ

1 1

2)(),( vU

21

1

2)(

)(

l

jijejie

ieieie

vxd

dd vxvx

Page 17: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 17

Cluster CentreCluster Centre

The centre of the cluster The centre of the cluster ii ( (vvii) is an ) is an ll characteristics vector.characteristics vector.

The The jthjth co-ordinate is calculated as co-ordinate is calculated as

n

eie

n

eejie

ij

xv

1

1

Page 18: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 18

Detailed AlgorithmDetailed Algorithm

Choose Choose cc (number of clusters). (number of clusters). Set error (Set error ( > 0) and step ( > 0) and step (rr=0).=0). Arbitrarily set matrix Arbitrarily set matrix UU(r)(r). Do not . Do not forget, each element belongs to a forget, each element belongs to a single cluster, no empty cluster single cluster, no empty cluster and no cluster has all elements.and no cluster has all elements.

Page 19: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 19

Detailed Algorithm cont.Detailed Algorithm cont.

RepeatRepeat Calculate the centre of the clusters Calculate the centre of the clusters vvii

(r)(r)

Calculate the distance Calculate the distance ddii(r)(r) of each of each

point to the centre of the clusterspoint to the centre of the clustersGenerate Generate UU(r+1)(r+1) recalculating all recalculating all characteristic functions using the characteristic functions using the equationsequations

UntilUntil ||||UU(r+1)(r+1)--UU(r)(r)|| < || <

0

)min(1 )()(

)1(kjdd r

je

r

ier

ie

Page 20: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 20

Matrix normsMatrix norms

Consider a matrix Consider a matrix UU of of nn lines and lines and nn columns:columns:

Column normColumn norm

Line normLine norm

n

iij

njaA

111 max

n

jij

niaA

11max

Page 21: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 21

K-means problems?K-means problems?

Suitable when clusters are compact clouds Suitable when clusters are compact clouds well separated.well separated.

Scalable because computational complexity Scalable because computational complexity is is O(nkr)O(nkr)..

Necessity of choosing Necessity of choosing cc is disadvantage. is disadvantage. Not suitable for nonconvex shapes.Not suitable for nonconvex shapes. It is sensitive to noise and outliers because It is sensitive to noise and outliers because

they influence the means.they influence the means. Depends on the initial allocation.Depends on the initial allocation.

Page 22: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 22

Examples of resultsExamples of results

0

1

2

3

4

5

6

0 1 2 3 4

Page 23: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 23

K-means: Actual DataK-means: Actual Data

Page 24: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 24

K-means: ResultsK-means: Results

Page 25: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 25

K-medoids

Page 26: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 26

K-medoids methodsK-medoids methods

K-means is sensitive to outliers since an K-means is sensitive to outliers since an object with an extremely large value may object with an extremely large value may distort the distribution of data.distort the distribution of data.

Instead of taking the mean value the most Instead of taking the mean value the most centrally object (medoid) is used as reference centrally object (medoid) is used as reference point.point.

The algorithm minimizes the sum of The algorithm minimizes the sum of dissimilarities between each object and the dissimilarities between each object and the medoid (similar to k-means)medoid (similar to k-means)

Page 27: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 27

K-medoids strategiesK-medoids strategies

Find k-medoids arbitrarily.Find k-medoids arbitrarily. Each remaining object is clustered with the Each remaining object is clustered with the

medoid to which is the most similar.medoid to which is the most similar. Then iteratively replaces one of the medoids Then iteratively replaces one of the medoids

by a non-medoid as long as the quality of the by a non-medoid as long as the quality of the clustering is improved.clustering is improved.

The quality is measured using a cost function The quality is measured using a cost function that measures the average dissimilarity that measures the average dissimilarity between the objects and the medoid of its between the objects and the medoid of its cluster.cluster.

Page 28: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 28

Reassignment costsReassignment costs

Each time a reassignment occurs a difference Each time a reassignment occurs a difference in square-error in square-error JJ is contributed. is contributed.

The cost function The cost function JJ calculates the total cost of calculates the total cost of replacing a current medoid by a non-medoid.replacing a current medoid by a non-medoid.

If the total cost is negative then If the total cost is negative then mmjj is replaced is replaced

by by mmrandomrandom, otherwise the replacement is not , otherwise the replacement is not

accepted. accepted.

Page 29: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 29

Replacing medoids case 1Replacing medoids case 1

Object Object pp belongs to medoid belongs to medoid mmjj. If . If mmjj is is replaced by replaced by mmrandomrandom and and pp is closest to is closest to one of one of mmii ( (ii<><>jj), then reassigns ), then reassigns pp to to mmii

mi mj

mrandom

p

Page 30: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 30

mrandom

Replacing medoids case 2Replacing medoids case 2

Object Object pp belongs to medoid belongs to medoid mmjj. If . If mmjj is is replaced by replaced by mmrandomrandom and and pp is closest to is closest to mmrandomrandom, then reassigns , then reassigns pp to to mmrandomrandom

mi mj

p

Page 31: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 31

mrandom

Replacing medoids case 3Replacing medoids case 3

Object Object pp belongs to medoid belongs to medoid mmii ( (ii<><>jj). If ). If mmjj is replaced by is replaced by mmrandomrandom and and pp is still is still close to close to mmii, then does not change., then does not change.

mi mjp

Page 32: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 32

mrandom

Replacing medoids case 4Replacing medoids case 4

Object Object pp belongs to medoid belongs to medoid mmii ( (ii<><>jj). If ). If mmjj is replaced by is replaced by mmrandomrandom and and pp is closest is closest to to mmrandomrandom,then reassigns ,then reassigns pp to to mmrandomrandom..

mi mj

p

Page 33: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 33

K-medoid algorithmK-medoid algorithm

Arbitrarily choose Arbitrarily choose kk objects as the objects as the initial medoids.initial medoids.

RepeatRepeatAssign each remaining object to the Assign each remaining object to the cluster with the nearest medoid;cluster with the nearest medoid;

Randomly select a nonmedoid object, Randomly select a nonmedoid object, mmrandomrandom;;

Compute the total cost Compute the total cost JJ of swapping of swapping mmjj with with mmrandomrandom;;

If If J<0J<0 then swap then swap oojj with with oorandomrandom;;

Until no changeUntil no change

Page 34: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 34

Comparisons?Comparisons?

K-medoids is more robust than k-means K-medoids is more robust than k-means in presence of noise and outliers.in presence of noise and outliers.

K-means is less costly in terms of K-means is less costly in terms of processing time.processing time.

Page 35: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 35

Fuzzy C-means

Page 36: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 36

Fuzzy C-meansFuzzy C-means

Fuzzy version of K-meansFuzzy version of K-means Elements may belong to more than one Elements may belong to more than one

clustercluster Values of characteristic function range Values of characteristic function range

from 0 to 1.from 0 to 1. It is interpreted as the degree of It is interpreted as the degree of

membership of an element to a cluster membership of an element to a cluster relative to all other clusters.relative to all other clusters.

Page 37: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 37

Fuzzy C-means setupFuzzy C-means setup

Consider Consider nn ( (XX={={xx11, x, x22, ..., x, ..., x

nn})}) objects and objects and cc clusters. clusters.

Each object Each object xxii is defined by is defined by ll

characteristics characteristics xxii=(=(xxi1i1, x, xi2i2, ..., x, ..., x

ilil).).

Consider Consider AA a set of a set of kk clusters clusters ((AA={={AA11, A, A

22, ..., A, ..., Ak k }).}).

Page 38: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 38

Fuzzy C-means propertiesFuzzy C-means properties

The union of all clusters makes the The union of all clusters makes the UniverseUniverse

There is no empty clusterThere is no empty cluster

iXA

AA

XA

i

ji

k

ii

1

Page 39: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 39

Membership functionMembership function

]1,0[)( ieeA xi

n

e

e

n

eie

jeie

c

iie

1

1

0

,0

,1

Page 40: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 40

Problems of probabilistic clustersProblems of probabilistic clusters

Points representing circle lines (C1 e C2)Points representing circle lines (C1 e C2) Due to normalization strange results may Due to normalization strange results may

emergeemerge

C1

C2{c1=0.5,c2= 0.5}

C1 C2

{c1=0.5,c2= 0.5}

{c1=0.7,c2= 0.3} {c1=0.3,c2= 0.7}

Page 41: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 41

Membership matrix Membership matrix UU

Matrix containing the values of inclusion Matrix containing the values of inclusion of each element into each cluster [0,1].of each element into each cluster [0,1].

Matrix has Matrix has cc (clusters) lines and (clusters) lines and nn (elements) columns.(elements) columns.

The sum of all elements in the column The sum of all elements in the column must be equal to one.must be equal to one.

The sum of each line must be less than n The sum of each line must be less than n e grater than 0. No empty cluster, or e grater than 0. No empty cluster, or cluster containing all elements.cluster containing all elements.

Page 42: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 42

Matrix examples Matrix examples

X1 X2 X3

X4 X5 X6

Two examples of clustering. What do the clusters represent?

8.07.01.0105.0

2.03.09.0015.01U

15.0002.00

02.0015.00

05.0105.01

2U

Page 43: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 43

Fuzzy C-means algorithm v1Fuzzy C-means algorithm v1

Arbitrarily assigns each object Arbitrarily assigns each object to a cluster (matrix to a cluster (matrix UU).).

RepeatRepeat Update the cluster centres;Update the cluster centres;

Reassign objects to the clusters Reassign objects to the clusters to which the objects are most to which the objects are most similar;similar;

UntilUntil no change; no change;

Page 44: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 44

Fuzzy C-means algorithm v2Fuzzy C-means algorithm v2

Arbitrarily choose Arbitrarily choose cc objects as objects as the initial cluster centres.the initial cluster centres.

RepeatRepeat Reassign objects to the clusters Reassign objects to the clusters to which the objects are most to which the objects are most similar.similar.

Update the cluster centres.Update the cluster centres.

Until no changeUntil no change

Page 45: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 45

Algorithm detailsAlgorithm details

The algorithm tries to minimise the The algorithm tries to minimise the function, function, mm is the nebulisation factor. is the nebulisation factor.

ddieie is the distance between the element is the distance between the element xxee ((ll characteristics) and the centre of the characteristics) and the centre of the cluster cluster ii ( (vvii))

n

e

c

iie

mie dJ

1 1

2)()(),( vU

21

1

2)(

)(

m

jijejie

ieieie

vxd

dd vxvx

Page 46: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 46

Nebulisation factorNebulisation factor

mm is the nebulisation factor. is the nebulisation factor. This value has a range [1,This value has a range [1,)) If If mm=1 the the system is crisp.=1 the the system is crisp. If If mm the all the membership values the all the membership values

tend to 1/tend to 1/cc The most common values are 1.25 and The most common values are 1.25 and

2.02.0

Page 47: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 47

Cluster CentreCluster Centre

The centre of the cluster The centre of the cluster ii ( (vvii) is a ) is a ll characteristics vector.characteristics vector.

The jth co-ordinate is calculated asThe jth co-ordinate is calculated as

n

e

mie

n

eej

mie

ij

xv

1

1

)(

)(

Page 48: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 48

Detailed AlgorithmDetailed Algorithm

Choose Choose cc (number of clusters). (number of clusters). Set error (Set error ( > 0), nebulisation > 0), nebulisation factor (factor (mm) and step () and step (rr=0).=0).

Arbitrarily set matrix Arbitrarily set matrix UU(r)(r). Do not . Do not forget, each element belongs to a forget, each element belongs to a single cluster, no empty cluster single cluster, no empty cluster and no cluster has all elements.and no cluster has all elements.

Page 49: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 49

Detailed Algorithm cont.Detailed Algorithm cont.

RepeatRepeat Calculate the centre of the Calculate the centre of the clusters clusters vvii

(r)(r)

Calculate the distance Calculate the distance ddii(r)(r) of of

each point to the centre of the each point to the centre of the clustersclusters

Generate Generate UU(r+1)(r+1) recalculating all recalculating all characteristic functions(characteristic functions(How?How?))

UntilUntil ||||UU(r+1)(r+1)--UU(r)(r)|| < || <

Page 50: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 50

How to recalculate?How to recalculate?

0else

1then0ifelse

then

]..1[,0if1

1

1

2

ie

ieie

c

l

m

le

ieie

ik

d

d

d

cid

If there is any If there is any distance greater than distance greater than zero then membership zero then membership grade is the weighted grade is the weighted average of the average of the distances to all centers.distances to all centers.

else the element else the element belongs to this cluster belongs to this cluster and no one else.and no one else.

Page 51: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 51

Example of clustering resultExample of clustering result

0

1

2

34

5

6

7

0 2 4 6 8

Point x y C0 C10 1 1 0,9993 0,00071 2 2 0,9999 0,00012 1 2 0,9996 0,00043 2 1 0,9996 0,00044 5 5 0,0000 1,00005 5 6 0,0005 0,99956 6 6 0,0010 0,99907 6 5 0,0005 0,99958 3,5 3,5 0,4559 0,5441

Page 52: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 52

Example of clustering resultExample of clustering result

Page 53: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 53

Possibilistic Clustering

Page 54: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 54

Membership functionMembership function

]1,0[)( ieeA xi

ec

iie

,01

Page 55: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 55

Membership functionMembership function

The membership degree is the The membership degree is the representativity of typicality of the representativity of typicality of the datum datum xx for the cluster for the cluster ii..

]1,0[)( ieeA xi e

c

iie

,01

Page 56: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 56

Algorithm detailsAlgorithm details

The algorithm tries to minimise the The algorithm tries to minimise the function, function, mm is the nebulisation factor. is the nebulisation factor.

The first sum is the usual and the second The first sum is the usual and the second rewards high memberships.rewards high memberships.

c

i

n

e

miei

n

e

c

iie

mie dJ

1 11 1

2 )1()()(),( vU

Page 57: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 57

How to calculate?How to calculate?

1

12

1

1

m

k

ie

ik

d

n

e

mie

n

eie

mie

k

d

1

1

)(

)(

Page 58: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 58

Detailed AlgorithmDetailed Algorithm

Choose Choose cc (number of clusters) (number of clusters) and and mm..

Set error (Set error ( > 0) and step > 0) and step ((rr=0).=0).

Execute FCMExecute FCM

Page 59: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 59

Detailed Algorithm cont.Detailed Algorithm cont.

ForFor 2 2 timestimesInitialize Initialize UU(0) (0) and the centre of the and the centre of the clusters clusters vvii

(0)(0) with previous results with previous results

InitializeInitialize kk and and r=0r=0

RepeatRepeat Calculate the distance Calculate the distance ddii

(r)(r) of each point of each point to the centre of the clustersto the centre of the clustersGenerate Generate UU(r+1)(r+1) recalculating all recalculating all characteristic functions using the characteristic functions using the equationsequations

UntilUntil ||||UU(r+1)(r+1)--UU(r)(r)|| < || <

End FOREnd FOR

Page 60: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 60

Gustafson-Kessel Algorithm

Page 61: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 61

Gustafson-Kessel methodGustafson-Kessel method

This method (GK) is fuzzy clustering This method (GK) is fuzzy clustering method similar to the Fuzzy C-means method similar to the Fuzzy C-means (FCM).(FCM).

The difference is the way the distance is The difference is the way the distance is calculated.calculated.

FCM uses Euclidean distancesFCM uses Euclidean distances GK uses Mahalanobis distancesGK uses Mahalanobis distances

Page 62: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 62

Gustafson-Kessel methodGustafson-Kessel method

Mahalobis distance is calculated asMahalobis distance is calculated as

The matrices The matrices AAii are given by are given by)()(2

ikiT

ikikd vxAvx

1)det( ip

ii SSA

Page 63: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 63

Gustafson-Kessel methodGustafson-Kessel method

The Fuzzy Covariance Matrix is The Fuzzy Covariance Matrix is

n

j

mij

n

j

Tjjij

mij

i

1

1))((

vxvxS

Page 64: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 64

GK commentsGK comments

The clusters are hyperellipsoids on the The clusters are hyperellipsoids on the ll..

The hyperellipsoids have aproximately The hyperellipsoids have aproximately the same size.the same size.

In order to be possible to calculate In order to be possible to calculate SS-1-1 the number of samples the number of samples nn must be at must be at least equal to the number of dimensions least equal to the number of dimensions ll plus 1. plus 1.

Page 65: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 65

Results of GKResults of GK

Page 66: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 66

Gath-Geva methodGath-Geva method

It is also known as Gaussian Mixture It is also known as Gaussian Mixture Decomposition.Decomposition.

It is similar to the FCM methodIt is similar to the FCM method The Gauss distance is used instead of The Gauss distance is used instead of

Euclidean distance.Euclidean distance. The clusters do not have a definite The clusters do not have a definite

shape anymore and have various sizes.shape anymore and have various sizes.

Page 67: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 67

Gath-Geva MethodGath-Geva Method

Gauss distance is given byGauss distance is given by

AAii=S=Sii-1-1

)()(

21

2 )det( ikiT

ik

i

iik e

Pd

vxAvxS

Page 68: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 68

Gath-Geva MethodGath-Geva Method

The term The term PPii is the probability of a is the probability of a sample belong to a cluster.sample belong to a cluster.

n

j

c

k

mij

n

j

mij

iP

1 1

1

Page 69: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 69

Gath-Geva CommentsGath-Geva Comments

Pi is a parameter that influences the Pi is a parameter that influences the size of a cluster.size of a cluster.

Bigger clusters attract more elements.Bigger clusters attract more elements. The exponential term makes more The exponential term makes more

difficult to avoid local minima.difficult to avoid local minima. Usually another clustering method is Usually another clustering method is

used to initialise the partition matrix used to initialise the partition matrix UU..

Page 70: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 70

GG Results – Random CentersGG Results – Random Centers

Page 71: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 71

GG Results – Centers FCMGG Results – Centers FCM

Page 72: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 72

Clustering based on Equivalence Clustering based on Equivalence RelationsRelations

A relation crisp A relation crisp RR on a universe on a universe XX can can be thought as a relation from be thought as a relation from XX to to XX

R is an R is an equivalence relationequivalence relation if it has the if it has the following three properties:following three properties: Reflexivity (Reflexivity (xxii, , xxii) ) RR

Symmetry (Symmetry (xxii, x, xjj) ) RR ( (xxjj, x, xii) ) RR

Transitivity (Transitivity (xxii, x, xjj) ) RR and ( and (xxjj, x, xkk) ) RR ( (xxii, ,

xxkk) ) RR

Page 73: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 73

Crisp tolerance relationCrisp tolerance relation

R is a R is a tolerance relationtolerance relation if it has the if it has the following two properties:following two properties:

Reflexivity (Reflexivity (xxii, , xxii) ) RR

Symmetry (Symmetry (xxii, x, xjj) ) RR ( (xxjj, x, xii) ) RR

Page 74: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 74

Composition of RelationsComposition of Relations

X Y Z

R S

T=R°S

Page 75: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 75

Composition of Crisp RelationsComposition of Crisp Relations

YX

SRy

yxyxSR )],(),([

productou

min

max

The operation ° is similar to matrix The operation ° is similar to matrix multiplication.multiplication.

Page 76: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 76

Transforming RelationsTransforming Relations

A tolerance relation can be transformed A tolerance relation can be transformed into a equivalence relation by at most into a equivalence relation by at most ((nn-1) compositions with itself.-1) compositions with itself.

nn is the cardinality of the set is the cardinality of the set RR..

1111

1 RRRRn

Page 77: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 77

Example of crisp classificationExample of crisp classification

Let Let XX={1,2,3,4,5,6,7,8,9,10}={1,2,3,4,5,6,7,8,9,10} Let Let RR be defined as the relation “for the be defined as the relation “for the

identical remainder after dividing each identical remainder after dividing each element by 3”.element by 3”.

This relation is an equivalence relationThis relation is an equivalence relation

Page 78: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 78

Relation MatrixRelation Matrix

100100100110

01001001009

00100100108

10010010017

01001001006

00100100105

10010010014

01001001003

00100100102

10010010011

10987654321

Page 79: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 79

Crisp ClassificationCrisp Classification

Consider equivalent columns.Consider equivalent columns. It is possible to group the elements in It is possible to group the elements in

the following classesthe following classes RR00 = {3, 6, 9} = {3, 6, 9}

RR11 = {1, 4, 7, 10} = {1, 4, 7, 10}

RR22 = {2, 5, 8} = {2, 5, 8}

Page 80: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 80

Clustering and Fuzzy Equivalence RelationsClustering and Fuzzy Equivalence Relations

A relation fuzzy A relation fuzzy RR on a universe on a universe XX can be thought can be thought as a relation from as a relation from XX to to XX

R is an R is an equivalence relationequivalence relation if it has the following if it has the following three properties:three properties: Reflexivity:Reflexivity: ( (xxii, , xxii) ) R R oror ((xxii, , xxii)) = 1= 1 Symmetry: Symmetry: ((xxii, x, xjj) ) RR ( (xxjj, x, xii) ) R R oror

((xxii, , xxjj) =) = ((xxjj, , xxii)) Transitivity:Transitivity: ( (xxii, x, xjj) and () and (xxjj, x, xkk) ) RR ( (xxii, x, xkk) ) R R

or ifor if ((xxii, , xxjj) =) = 1 1 andand ((xxjj, , xxkk) = ) = 2 2 then then ((xxii, , xxkk) = ) = andand >=min>=min((11, , 22))

Page 81: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 81

Fuzzy tolerance relationFuzzy tolerance relation

R is a R is a tolerance relationtolerance relation if it has the if it has the following two properties:following two properties:

Reflexivity (Reflexivity (xxii, , xxii) ) RR

Symmetry (Symmetry (xxii, x, xjj) ) RR ( (xxjj, x, xii) ) RR

Page 82: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 82

Composition of Fuzzy RelationsComposition of Fuzzy Relations

YX

SRy

yxyxSR )],(),([

productou

min

max

The operation ° is similar to matrix The operation ° is similar to matrix multiplication.multiplication.

Page 83: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 83

Distance RelationDistance Relation

Let Let XX be a set of data on be a set of data on ll.. The distance function is a tolerance relation The distance function is a tolerance relation

that can be transformed into a equivalence.that can be transformed into a equivalence. The relation The relation RR can be defined by the can be defined by the

Minkowski distance formula.Minkowski distance formula.

is a constant that ensures that is a constant that ensures that RR[0,1] and [0,1] and is equal to the inverse of the largest distance is equal to the inverse of the largest distance in in XX. .

l

j

qq

kjijki xxR1

1)(1),( xx

Page 84: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 84

Example of Fuzzy classificationExample of Fuzzy classification

Let Let XX={(0,0),(1,1),(2,3),(3,1),(4,0)} be a set ={(0,0),(1,1),(2,3),(3,1),(4,0)} be a set of points in of points in 22..

Set Set qq=2, Euclidean distances.=2, Euclidean distances. The largest distance is 4 (The largest distance is 4 (xx11,,xx55), so ), so =0.25.=0.25. The relation The relation RR can be calculated by the can be calculated by the

equation equation

l

jkjijki xxR

1

212)(25.01),( xx

Page 85: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 85

Points to be classifiedPoints to be classified

0

0,5

1

1,52

2,5

3

3,5

0 1 2 3 4 5

Page 86: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 86

Tolerance matrixTolerance matrix

The matrix calculated by the equation isThe matrix calculated by the equation is

The is a tolerance relation that needs to be The is a tolerance relation that needs to be transformed into a equivalence relationtransformed into a equivalence relation

165.1.21.0

65.144.5.21.

1.44.144.1.

21.5.44.165.

021.165.1

R

Page 87: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 87

Equivalence matrixEquivalence matrix

The matrix transformed isThe matrix transformed is

165.44.5.5.

65.144.5.5.

44.44.144.44.

5.5.44.165.

5.5.44.65.1

R

Page 88: Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 88

Results of clusteringResults of clustering

Taking Taking -cuts of fuzzy equivalent -cuts of fuzzy equivalent relation at various values of relation at various values of =0.44, 0.5, =0.44, 0.5, 0.65 and 1.0 we get the following 0.65 and 1.0 we get the following classes:classes:

RR.44.44=[{=[{xx11,,xx22,,xx33,,xx44,,xx55}]}] RR.55.55=[{=[{xx11,,xx22,,xx44,,xx55}{}{xx33}]}] RR.65.65=[{=[{xx11,,xx22},{},{xx33},{},{xx44,,xx55}]}] RR1.01.0=[{=[{xx11},{},{xx22},{},{xx33},{},{xx44},{},{xx55}]}]