cse601 partitional clustering - university at buffalojing/cse601/fa12/materials/clustering... ·...
TRANSCRIPT
![Page 1: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/1.jpg)
Clustering Lecture 2: Partitional Methods
Jing Gao SUNY Buffalo
1
![Page 2: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/2.jpg)
Outline
• Basics – Motivation, definition, evaluation
• Methods – Partitional
– Hierarchical
– Density-based
– Mixture model
– Spectral methods
• Advanced topics – Clustering ensemble
– Clustering in MapReduce
– Semi-supervised clustering, subspace clustering, co-clustering, etc.
2
![Page 3: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/3.jpg)
Partitional Methods
• K-means algorithms
• Optimization of SSE
• Improvement on K-Means
• K-means variants
• Limitation of K-means
3
![Page 4: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/4.jpg)
Partitional Methods
• Center-based – A cluster is a set of objects such that an object in a
cluster is closer (more similar) to the “center” of a cluster, than to the center of any other cluster
– The center of a cluster is called centroid
– Each point is assigned to the cluster with the closest centroid
– The number of clusters usually should be specified
4 center-based clusters
4
![Page 5: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/5.jpg)
5
K-means
• Partition {x1,…,xn} into K clusters
– K is predefined
• Initialization
– Specify the initial cluster centers (centroids)
• Iteration until no change
– For each object xi • Calculate the distances between xi and the K centroids
• (Re)assign xi to the cluster whose centroid is the closest to xi
– Update the cluster centroids based on current assignment
![Page 6: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/6.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
K-means: Initialization
Initialization: Determine the three cluster centers
m1
m2
m3
6
![Page 7: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/7.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Cluster Assignment Assign each object to the cluster which has the closet distance from the centroid to the object
m1
m2
m3
7
![Page 8: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/8.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Update Cluster Centroid
Compute cluster centroid as the center of the points in the cluster
m1
m2
m3
8
![Page 9: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/9.jpg)
K-means Clustering: Update Cluster Centroid
Compute cluster centroid as the center of the points in the cluster
0
1
2
3
4
5
0 1 2 3 4 5
m1
m2
m3
9
![Page 10: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/10.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
m1
m2
m3
K-means Clustering: Cluster Assignment Assign each object to the cluster which has the closet distance from the centroid to the object
10
![Page 11: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/11.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
m1
m2
m3
K-means Clustering: Update Cluster Centroid
Compute cluster centroid as the center of the points in the cluster
11
![Page 12: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/12.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
m1
m2 m3
K-means Clustering: Update Cluster Centroid
Compute cluster centroid as the center of the points in the cluster
12
![Page 13: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/13.jpg)
Partitional Methods
• K-means algorithms
• Optimization of SSE
• Improvement on K-Means
• K-means variants
• Limitation of K-means
13
![Page 14: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/14.jpg)
Sum of Squared Error (SSE)
• Suppose the centroid of cluster Cj is mj
• For each object x in Cj, compute the squared error between x and the centroid mj
• Sum up the error of all the objects
j Cx
j
j
mxSSE 2)(
14
1 2 4 5 m1=
1.5
m2=
4.5
1)5.45()5.44()5.12()5.11( 2222 SSE
![Page 15: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/15.jpg)
How to Minimize SSE
15
• Two sets of variables to minimize
– Each object x belongs to which cluster?
– What’s the cluster centroid? mj
• Block coordinate descent
– Fix the cluster centroid—find cluster assignment that minimizes the current error
– Fix the cluster assignment—compute the cluster centroids that minimize the current error
j Cx
j
j
mx 2)(min
jCx
![Page 16: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/16.jpg)
Cluster Assignment Step
16
• Cluster centroids (mj) are known
• For each object
– Choose Cj among all the clusters for x such that the distance between x and mj is the minimum
– Choose another cluster will incur a bigger error
• Minimize error on each object will minimize the SSE
j Cx
j
j
mx 2)(min
![Page 17: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/17.jpg)
17
0 1 2 3 4 5 6 7 8 9 10
10
9
8
7
6
5
4
3
2
1
m1
m2
x1
x2
x3
x4
x5
Example—Cluster Assignment
1321,, Cxxx
254, Cxx
2
13
2
12
2
11 )()()( mxmxmxSSE
Given m1, m2, which cluster each of the five points belongs to?
Assign points to the closet centroid—minimize SSE
2
25
2
24 )()( mxmx
![Page 18: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/18.jpg)
Cluster Centroid Computation Step
18
• For each cluster
– Choose cluster centroid mj as the center of the points
• Minimize error on each cluster will minimize the SSE
j Cx
j
j
mx 2)(min
|| j
Cx
jC
x
mj
![Page 19: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/19.jpg)
19
10
9
8
7
6
5
4
3
2
1
m1
m2
x1
x2
x3
x4
x5
Example—Cluster Centroid Computation
Given the cluster assignment, compute the centers of the two clusters
0 1 2 3 4 5 6 7 8 9 10
![Page 20: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/20.jpg)
Comments on the K-Means Method
• Strength
– Efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations.
Normally, k, t << n
– Easy to implement
• Issues
– Need to specify K, the number of clusters
– Local minimum– Initialization matters
– Empty clusters may appear
20
![Page 21: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/21.jpg)
Partitional Methods
• K-means algorithms
• Optimization of SSE
• Improvement on K-Means
• K-means variants
• Limitation of K-means
21
![Page 22: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/22.jpg)
Importance of Choosing Initial Centroids
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 6
22
![Page 23: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/23.jpg)
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
xy
Iteration 5
Importance of Choosing Initial Centroids
23
![Page 24: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/24.jpg)
Problems with Selecting Initial Points
• If there are K ‘real’ clusters then the chance of selecting one centroid from each cluster is small
– Chance is relatively small when K is large – If clusters are the same size, n, then
– For example, if K = 10, then probability = 10!/1010 = 0.00036
– Sometimes the initial centroids will readjust themselves in ‘right’ way, and sometimes they don’t
24
![Page 25: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/25.jpg)
10 Clusters Example
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
yIteration 1
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
yIteration 2
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
yIteration 3
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
yIteration 4
Starting with two initial centroids in one cluster of each pair of clusters
25
![Page 26: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/26.jpg)
10 Clusters Example
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 1
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 2
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 3
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 4
Starting with two initial centroids in one cluster of each pair of clusters
26
![Page 27: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/27.jpg)
10 Clusters Example
Starting with some pairs of clusters having three initial centroids, while other have only one.
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 1
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 2
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 3
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 4
27
![Page 28: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/28.jpg)
10 Clusters Example
Starting with some pairs of clusters having three initial centroids, while other have only one.
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
yIteration 1
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 2
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 3
0 5 10 15 20
-6
-4
-2
0
2
4
6
8
x
y
Iteration 4
28
![Page 29: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/29.jpg)
Solutions to Initial Centroids Problem
• Multiple runs
– Average the results or choose the one that has the smallest SSE
• Sample and use hierarchical clustering to determine initial centroids
• Select more than K initial centroids and then select among these initial centroids
– Select most widely separated
• Postprocessing—Use K-means’ results as other algorithms’ initialization
• Bisecting K-means
– Not as susceptible to initialization issues
29
![Page 30: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/30.jpg)
Bisecting K-means
• Bisecting K-means algorithm
– Variant of K-means that can produce a partitional or a hierarchical clustering
30
![Page 31: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/31.jpg)
Handling Empty Clusters
• Basic K-means algorithm can yield empty clusters
• Several strategies
– Choose the point that contributes most to SSE
– Choose a point from the cluster with the highest SSE
– If there are several empty clusters, the above can be repeated several times
31
![Page 32: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/32.jpg)
Updating Centers Incrementally
• In the basic K-means algorithm, centroids are updated after all points are assigned to a centroid
• An alternative is to update the centroids after each assignment (incremental approach) – Each assignment updates zero or two centroids
– More expensive
– Introduces an order dependency
– Never get an empty cluster
– Can use “weights” to change the impact
32
![Page 33: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/33.jpg)
Pre-processing and Post-processing
• Pre-processing
– Normalize the data
– Eliminate outliers
• Post-processing
– Eliminate small clusters that may represent outliers
– Split ‘loose’ clusters, i.e., clusters with relatively high SSE
– Merge clusters that are ‘close’ and that have relatively low SSE
33
![Page 34: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/34.jpg)
Partitional Methods
• K-means algorithms
• Optimization of SSE
• Improvement on K-Means
• K-means variants
• Limitation of K-means
34
![Page 35: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/35.jpg)
35
Variations of the K-Means Method
• Most of the variants of the K-means which differ in
– Dissimilarity calculations
– Strategies to calculate cluster means
• Two important issues of K-means
– Sensitive to noisy data and outliers
• K-medoids algorithm
– Applicable only to objects in a continuous multi-dimensional
space
• Using the K-modes method for categorical data
![Page 36: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/36.jpg)
36
Sensitive to Outliers
• K-means is sensitive to outliers
– Outlier: objects with extremely large (or small) values
• May substantially distort the distribution of the data
+
+
outlier
![Page 37: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/37.jpg)
37
K-Medoids Clustering Method
• Difference between K-means and K-medoids – K-means: Computer cluster centers (may not be the original data
point) – K-medoids: Each cluster’s centroid is represented by a point in the
cluster – K-medoids is more robust than K-means in the presence of
outliers because a medoid is less influenced by outliers or other extreme values
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
k-means k-medoids
![Page 38: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/38.jpg)
The K-Medoid Clustering Method
• K-Medoids Clustering: Find representative objects (medoids) in clusters
– PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)
• Starts from an initial set of medoids and iteratively replaces one of the
medoids by one of the non-medoids if it improves the total distance of the
resulting clustering
• PAM works effectively for small data sets, but does not scale well for large
data sets. Time complexity is O(k(n-k)2) for each iteration where n is # of
data objects, k is # of clusters
• Efficiency improvement on PAM
– CLARA (Kaufmann & Rousseeuw, 1990): PAM on samples
– CLARANS (Ng & Han, 1994): Randomized re-sampling
38
![Page 39: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/39.jpg)
39
PAM: A Typical K-Medoids Algorithm
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Total Cost = 20
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K=2
Arbitrary choose k object as initial medoids
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Assign each remaining object to nearest medoids
Randomly select a nonmedoid object,Oramdom
Compute total cost of swapping
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Total Cost = 26
Swapping O and Oramdom
If quality is improved.
Do loop
Until no change
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
![Page 40: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/40.jpg)
40
K-modes Algorithm
• Handling categorical data: K-modes (Huang’98) – Replacing means of clusters
with modes • Given n records in cluster,
mode is a record made up of the most frequent attribute values
– Using new dissimilarity measures to deal with categorical objects
A mixture of categorical and numerical data: K-prototype method
age income student credit_rating
< = 30 high no fair
< = 30 high no excellent
31…40 high no fair
> 40 medium no fair
> 40 low yes fair
> 40 low yes excellent
31…40 low yes excellent
< = 30 medium no fair
< = 30 low yes fair
> 40 medium yes fair
< = 30 medium yes excellent
31…40 medium no excellent
31…40 high yes fair
mode = (<=30, medium, yes, fair)
![Page 41: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/41.jpg)
Limitations of K-means
• K-means has problems when clusters are of differing
– Sizes
– Densities
– Irregular shapes
41
![Page 42: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/42.jpg)
Limitations of K-means: Differing Sizes
Original Points K-means (3 Clusters)
42
![Page 43: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/43.jpg)
Limitations of K-means: Differing Density
Original Points K-means (3 Clusters)
43
![Page 44: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/44.jpg)
Limitations of K-means: Irregular Shapes
Original Points K-means (2 Clusters)
44
![Page 45: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/45.jpg)
Overcoming K-means Limitations
Original Points K-means Clusters
One solution is to use many clusters. Find parts of clusters, but need to put together.
45
![Page 46: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/46.jpg)
Overcoming K-means Limitations
Original Points K-means Clusters
46
![Page 47: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/47.jpg)
Overcoming K-means Limitations
Original Points K-means Clusters
47
![Page 48: CSE601 Partitional Clustering - University at Buffalojing/cse601/fa12/materials/clustering... · – Sometimes the initial centroids will readjust themselves in right way, and sometimes](https://reader031.vdocument.in/reader031/viewer/2022022509/5ad6dec37f8b9a9d5c8b6195/html5/thumbnails/48.jpg)
Take-away Message
• What’s partitional clustering?
• How does K-means work?
• How is K-means related to the minimization of SSE?
• What are the strengths and weakness of K-means?
• What are the variants of K-means?
48