k-means*: clustering by gradual data transformation

33
University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www.uef.fi/cs K-means*: Clustering by Gradual Data Transformation Mikko Malinen and Pasi Fränti Speech and Image Processing Unit School of Computing University of Eastern Finland

Upload: miranda-finch

Post on 31-Dec-2015

25 views

Category:

Documents


2 download

DESCRIPTION

K-means*: Clustering by Gradual Data Transformation. Mikko Malinen and Pasi Fränti. Speech and Image Processing Unit School of Computing University of Eastern Finland. K-means* clustering. Gradual transformation of data. Fit the data to a model. Model. Intermediate. Final. - PowerPoint PPT Presentation

TRANSCRIPT

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

K-means*: Clustering by Gradual Data Transformation

Mikko Malinen and Pasi Fränti

Speech and Image Processing Unit

School of Computing

University of Eastern Finland

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

K-means* clustering Gradual transformation of data

Model

Data

Fit the data to a model

Intermediate Final

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

K-means clustering

Iterate between two steps:

1. Assignment step Assign the points to the nearest centroids

2. Update step Update the location of centroids

)(

)(

)1( 1t

ij Sjt

i

ti

S x

x m

},...,1*:{ )(*

)()( kiS tij

tijj

ti m x m x x

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

K-means* clustering

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Example of clustering (s2 dataset)

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

0% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

10% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

20% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

30% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

40% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

50% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

60% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

70% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

80% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

90% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

100% done

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Empty clusters problem

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Initialization

Data set transform

Empty clusters removal

K-means

Algorithm total

)(nOkfree kPhase )( nOk )1(Ok

)(nO

)(nO

)(nO

)(nO

)(nO

)(nO

)(nO

)(nO

)(nO)( 2nkO )( 3nO )( 2nO

)( 1kdknO )( 2)( dnOnO )( 2

3dn

nO )( 1kdnO

)( 1kdknO )( 2)( dnOnO )( 2

3dn

nO )( 1kdnO

Time Complexity

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Time ComplexityFixed k-means

Initialization

Data set transform

Empty clusters removal

K-means

Algorithm total

)(nOkfree kPhase )( nOk )1(Ok

)(nO )(nO )(nO )(nO

)(nO)( 2nkO )( 3nO )( 2nO

)(knO )( 2nO )( 5.1nO

)(nO )(nO )(nO )(nO

)(nO)( 2nkO )( 3nO )( 2nO

)(nO

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

s1 d = 2n = 5000 k = 15

s2 d = 2n = 5000 k = 15

s3 d = 2n = 5000 k = 15

s4 d = 2n = 5000 k = 15

bridge d = 16n = 4096 k= 256

missa d = 16n = 6480 k= 256

house d = 3n=34000 k=256

thyroid d = 5n = 215 k = 2

iris d = 4n = 150 k = 2

wine d = 13n = 178 k = 3

Datasets

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error

Dataset k-means proposed GKM optimal

s1 1.85 1.01 0.89 0.89

s2 1.94 1.52 1.33 1.33

s3 1.97 1.71 1.69 1.69

s4 1.69 1.63 1.57 1.57

bridge 168.2 164.7 164.1 160.7

missa 5.33 5.15 5.34 5.12

house 9.88 9.48 5.94 5.86

thyroid 6.97 6.92 1.52 1.52

iris 3.70 3.70 2.02 2.02

wine 1.92 1.90 0.88 0.88

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

All correct:

Number of incorrect clusters

proposed: 36%k-means: 14%

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

1 incorrect:

Number of incorrect clusters

proposed: 64%k-means: 38%

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

2 incorrect:

Number of incorrect clusters

proposed: 0%k-means: 34%

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

3 incorrect:

Number of incorrect clusters

proposed: 0%k-means: 10%

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Summary

• We have presented a clustering method based on gradual transformation of data and k-means. Instead of fitting the model to data, we fit the data to a model.

• The proposed method gives better mean square error than k-means.