k means clustering algorithm

24
K-means K-means clustering clustering algorithm algorithm Kasun Ranga Wijeweera (krw19870829@gmail. com)

Upload: kasun-ranga-wijeweera

Post on 21-Nov-2014

4.893 views

Category:

Technology


8 download

DESCRIPTION

 

TRANSCRIPT

Page 1: K means Clustering Algorithm

K-means clustering K-means clustering algorithmalgorithm

Kasun Ranga Wijeweera

([email protected])

Page 2: K means Clustering Algorithm

• Organizing data into classes such that there is

• high intra-class similarity

• low inter-class similarity

• Finding the class labels and the number of classes directly from the data (in contrast to classification).

• More informally, finding natural groupings among objects.

What is Clustering?What is Clustering?

Page 3: K means Clustering Algorithm

What is a natural grouping among these objects?What is a natural grouping among these objects?

Page 4: K means Clustering Algorithm

School Employees Simpson's Family Males Females

Clustering is subjectiveClustering is subjective

What is a natural grouping among these objects?What is a natural grouping among these objects?

Page 5: K means Clustering Algorithm

Defining Distance MeasuresDefining Distance MeasuresDefinition: Let O1 and O2 be two objects from the universe

of possible objects. The distance (dissimilarity) between O1 and O2 is a real number denoted by D(O1,O2)

0.23 3 342.7

Kasun Kiosn

Page 6: K means Clustering Algorithm

Consider a Set of Data Points,Consider a Set of Data Points,

And a Set of Clusters,And a Set of Clusters,

Page 7: K means Clustering Algorithm

The Goal,The Goal,

Page 8: K means Clustering Algorithm

Algorithm k-means1. Randomly choose K data items from X as initial centroids.

2. Repeat

Assign each data point to the cluster which has the closest centroid.

Calculate new cluster centroids.

Until the convergence criteria is met.

Page 9: K means Clustering Algorithm

The data points

Page 10: K means Clustering Algorithm

Initialization

Page 11: K means Clustering Algorithm

#Runs = 1

Page 12: K means Clustering Algorithm

#Runs = 2

Page 13: K means Clustering Algorithm

#Runs = 3

Page 14: K means Clustering Algorithm

K-means gets stuck in a local optima

Page 15: K means Clustering Algorithm

The data points

Page 16: K means Clustering Algorithm

Initialization

Page 17: K means Clustering Algorithm

#Runs = 1

Page 18: K means Clustering Algorithm

#Runs = 2

Page 19: K means Clustering Algorithm

#Runs = 3

Page 20: K means Clustering Algorithm

#Runs = 4

Page 21: K means Clustering Algorithm

Applications of K-means Method

• Optical Character Recognition

• Biometrics

• Diagnostic Systems

• Military Applications

Page 22: K means Clustering Algorithm

Comments on the Comments on the K-MeansK-Means Method Method

• Strength – Relatively efficient: O(tkn), where n is # objects, k is # clusters,

and t is # iterations. Normally, k, t << n.– Often terminates at a local optimum. The global optimum may

be found using techniques such as: deterministic annealing and genetic algorithms

• Weakness– Applicable only when mean is defined, then what about

categorical data?– Need to specify k, the number of clusters, in advance– Unable to handle noisy data and outliers– Not suitable to discover clusters with non-convex shapes

Page 23: K means Clustering Algorithm

Any Questions ?Any Questions ?

Page 24: K means Clustering Algorithm

Thanks for your attention !Thanks for your attention !