k-means clustering j.-s. roger jang ( 張智星 ) csie dept., national taiwan univ., taiwan...
TRANSCRIPT
K-means ClusteringK-means Clustering
J.-S. Roger Jang J.-S. Roger Jang ((張智星張智星 ))
CSIE Dept., National Taiwan Univ., TaiwanCSIE Dept., National Taiwan Univ., Taiwanhttp://mirlab.org/janghttp://mirlab.org/jang
[email protected]@mirlab.org
Machine Learning K-means Clustering
Machine Learning K-means Clustering
2 112/04/19 2
Problem DefinitionProblem Definition
Input:• : A data set in d-dim. space• m: Number of clusters (we avoid using k here to avoid
confusion with other summation indices…)
Output:• m cluster centers:• Assignment of each xi to one of the m clusters:
Requirement:• The output should minimize the objective function…
mjc j 1,
},,,{ 21 nxxxX
ia
mjniam
jij
ij
,1
1,1},1,0{
1
Machine Learning K-means Clustering
3 112/04/19 3
Objective FunctionObjective Function
Objective function (aka. distortion)
• d*m (for matrix C) plus n*m (for matrix A) tunable parameters with certain constraints on matrix A
• Np-hard problem if exact solution is required
iawithGxiffa
cccC
xxxX
wherecxacxeACXJ
cxe
m
jijjiij
m
n
m
j
n
ijiij
m
j Gxji
m
jj
Gxjij
ji
ji
,1,1
},,,{
},,,{
,),;(
1
21
21
1 1
2
1
2
1
2
Machine Learning K-means Clustering
4 112/04/19 4
Strategy for MinimizationStrategy for Minimization
Observation• J(X; C, A) is parameterized by C and A• Joint optimization is hard, but separate
optimization with respect to C and A is easy
Strategy• Fix C and find the best A to minimize J(X; C, A)• Fix A and find the best C to minimize J(X; C, A)• Iterate the above two steps until convergence
Properties• The approach is also known as “coordinate
optimization”
Machine Learning K-means Clustering
5 112/04/19 5
Task 1: How to Find Assignment A?Task 1: How to Find Assignment A?
Goal• Find A to minimize J(X; C, A) with fixed C
Facts• Analytic (close-form) solution exists:
• CACXJACXJACXJAA
),ˆ,;(),;(),;(minargˆ
otherwise
cxjifa qi
qij,0
minarg1ˆ
2
Machine Learning K-means Clustering
6 112/04/19 6
Task 2: How to Find Centers in C?Task 2: How to Find Centers in C?
Goal• Find C to minimize J(X; C, A) with fixed A
Facts• Analytic (close-form) solution exists:
• AACXJACXJACXJCC
),,ˆ;(),;(),;(minargˆ
n
iij
n
iiij
j
a
xac
1
1ˆ
Machine Learning K-means Clustering
7 112/04/19 7
AlgorithmAlgorithm
1. Initialize- Select initial m cluster centers
2. Find clusters - For each xi, assign the cluster with nearest center Find A to minimize J(X; C, A) with fixed C
3. Find centers- Recompute each cluster center as the mean of data in
the cluster Find C to minimize J(X; C, A) with fixed A
4. Stopping criterion- Stop if clusters stay the same. Otherwise go to step 2.
Machine Learning K-means Clustering
8 112/04/19 8
Stopping CriteriaStopping Criteria
Two stopping criteria• Repeating until no more change in cluster
assignment• Repeat until distortion improvement is less than
a threshold
Facts• Convergence is assured since J is reduced
repeatedly.
),;(),;(),;(),;(),;(_),;( 33232212111 ACXJACXJACXJACXJACXJCXJ
Machine Learning K-means Clustering
9 112/04/19 9
Properties of K-means ClusteringProperties of K-means Clustering
• K-means can find the approximate solution efficiently.
• The distortion (squared error) is a monotonically non-increasing function of iterations.
• The goal is to minimize the square error, but it could end up in a local minimum.
• To increase the probability of finding the global minimum, try to start k-means with different initial conditions.
• “Cluster validation” refers to a set of methods which try to determine the best value of k.
• Other distance measures can be used in place of the Euclidean distance, with corresponding change in center identification.
Machine Learning K-means Clustering
10 112/04/19 10
K-means SnapshotsK-means Snapshots
Machine Learning K-means Clustering
11
Demo of K-means ClusteringDemo of K-means Clustering
• Toolbox download• Utility Toolbox• Machine Learning Toolbox
• Demos•kMeansClustering.m•vecQuantize.m
112/04/19 11
Machine Learning K-means Clustering
12
Demos of K-means ClusteringDemos of K-means Clustering
kMeansClustering.m
112/04/19 12
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Input 1
Inpu
t 2
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Machine Learning K-means Clustering
13 112/04/19 13
Application: Image CompressionApplication: Image Compression
Goal• Convert a image from true colors to indexed
colors with minimum distortion.
Steps• Collect data from a true-color image• Perform k-means clustering to obtain cluster
centers as the indexed colors • Compute the compression rate
cnmc
cccnm
nm
after
before
bitsccnmafter
bitsnmbefore
22
2
2
log
24
*24
log
24
8*3*log**
8*3**
8*3*log**
8*3**
Machine Learning K-means Clustering
14
True-color vs. Index-color ImagesTrue-color vs. Index-color Images
True-color image Each pixel is
represented by a vector of 3 components [R, G, B]
Index-color image Each pixel is
represented by an index into a color map
112/04/19 14
Machine Learning K-means Clustering
15
Example: Image CompressionExample: Image Compression
Original imageDimension: 480x640
Data size: 480*640*3*8 bits = 0.92 MB
112/04/19 15
Machine Learning K-means Clustering
16
Example: Image CompressionExample: Image Compression
Some quantities of the k-means clusteringn = 480x640 = 307200 (no of vectors to be clustered)
d = 3 (R, G, B)
m = 256 (no. of clusters)
112/04/19 16
Machine Learning K-means Clustering
17112/04/19 17
Example: Image CompressionExample: Image Compression
Machine Learning K-means Clustering
18112/04/19 18
Example: Image CompressionExample: Image Compression
Machine Learning K-means Clustering
19
Indexing TechniquesIndexing Techniques
Indexing of pixels for an 2*3*3 image
Related command: reshape
X = imread('annie19980405.jpg');
image(X);
[m, n, p]=size(X);
index=reshape(1:m*n*p, m*n, 3)';
data=double(X(index));112/04/19 19
181716151413
121110987
65432113 15 17
14 16 187 9 11
8 10 121 3 5
2 4 6
Machine Learning K-means Clustering
20 112/04/19 20
CodeCodeX = imread('annie19980405.jpg');
image(X)
[m, n, p]=size(X);
index=reshape(1:m*n*p, m*n, 3)';
data=double(X(index));
maxI=6;
for i=1:maxI
centerNum=2^i;
fprintf('i=%d/%d: no. of centers=%d\n', i, maxI, centerNum);
center=kMeansClustering(data, centerNum);
distMat=distPairwise(center, data);
[minValue, minIndex]=min(distMat);
X2=reshape(minIndex, m, n);
map=center'/255;
figure; image(X2); colormap(map); colorbar; axis image;
end
Machine Learning K-means Clustering
21
Extensions to Image CompressionExtensions to Image Compression
Extensions to image data compression via clustering1. Use blocks as the unit for VQ (see exercise)
- Smart indexing by creating the indices of the blocks of page 1 first.
- True-color image display (No way to display the compressed image as an index-color image)
2. Use separate code books for RGB
112/04/19 21
What are the corresponding compression ratios?
Machine Learning K-means Clustering
22
Extension to Circle FindingExtension to Circle Finding
How to find circles via a k-means like algorithm?
112/04/19 22
-1 0 1 2 3 4-2
-1
0
1
2
3
4
Machine Learning K-means Clustering
23 112/04/19 23