random swap algorithm - itä-suomen yliopistocs.uef.fi/pages/franti/cluster/randomswap.pdf · 2018....
TRANSCRIPT
![Page 1: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/1.jpg)
Random Swapalgorithm
Pasi Fränti24.4.2018
![Page 2: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/2.jpg)
Definitions and data
Set of N data points:X={x1 , x2 , …, xN
}
Set of k cluster prototypes (centroids):C={c1 , c2 , …, ck
},
P={p1 , p2 , …, pk
},
Partition of the data:
![Page 3: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/3.jpg)
N
iPi iCxd
dNMSE
1
21
NiCxP jikj
i ,1 minarg2
1
kjxCjPjP
ijii
,1 1
Clustering problem
Objective function:
Optimality of partition:
Optimality of centroid:
![Page 4: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/4.jpg)
K-means algorithmX = Data setC = Cluster centroidsP = Partition
K-Means(X, C) → (C, P)REPEAT
Cprev
←
C;FOR i=1 TO
N DO
pi ←
FindNearest(xi , C);
FOR j=1
TO
k DOcj ←
Average of xi
pi = j;
UNTIL C = Cprev
Optimal partition
Optimal centoids
![Page 5: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/5.jpg)
Problems of k-means
![Page 6: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/6.jpg)
Swapping strategy
![Page 7: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/7.jpg)
Pigeon hole principle
P. Fränti, M. Rezaei and Q. Zhao"Centroid index: cluster level similarity measure”Pattern Recognition, 47 (9), 3034-3045, September 2014, 2014.
CI = Centroid index:
![Page 8: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/8.jpg)
Pigeon hole principle
S2
![Page 9: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/9.jpg)
Aim of the swap
Two centroids , butonly one cluster .
One centroid , buttwo clusters .
Two centroids , butonly one cluster .
One centroid , buttwo clusters .
S2
![Page 10: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/10.jpg)
Random Swap algorithm
Random Swap(X) C, P
C Select random representatives(X); P Optimal partition(X, C); REPEAT T times
(Cnew,j) Random swap(X, C); Pnew Local repartition(X, Cnew, P, j); Cnew, Pnew Kmeans(X, Cnew, Pnew); IF f(Cnew, Pnew) < f(C, P) THEN
(C, P) Cnew, Pnew; RETURN (C, P);
![Page 11: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/11.jpg)
Steps of the swap
1. Random swap:
2. Re-allocate vectors from old cluster:
3. Create new cluster:
),1(),,1( Nrandomikrandomjxc ij
jpicxdp ijikj
i
,minarg 2
1
p d x c i Nik j k p
i ki
arg min , ,2
1
![Page 12: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/12.jpg)
Swap is made fromcentroid rich area tocentroid poor area.
Swap is made fromcentroid rich area tocentroid poor area.
Swap
![Page 13: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/13.jpg)
Local re-partition
Re-allocate vectors
Create new cluster
![Page 14: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/14.jpg)
Iterate by k-means1st iteration
![Page 15: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/15.jpg)
Iterate by k-means2nd iteration
![Page 16: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/16.jpg)
Iterate by k-means3rd iteration
![Page 17: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/17.jpg)
Iterate by k-means16th iteration
![Page 18: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/18.jpg)
Iterate by k-means17th iteration
![Page 19: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/19.jpg)
Iterate by k-means18th iteration
![Page 20: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/20.jpg)
Iterate by k-means19th iteration
![Page 21: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/21.jpg)
Final result25 iterations
![Page 22: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/22.jpg)
Extreme example
![Page 23: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/23.jpg)
176.53
163.93 163.63 163.51 163.08
150
155160
165170
175
180185
190
K-means Random+ RS
K-means+ RS
Split +RS
Ward +RS
MSE
Bridge
Dependency on initial solution
![Page 24: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/24.jpg)
Data sets
![Page 25: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/25.jpg)
Data sets
![Page 26: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/26.jpg)
Data sets visualized Images
Bridge House Miss America Europe
16-d 3-d 16-d 2-d
4x4 blocksframe differential
Differentialcoordinates
RGB color4x4 blocks
![Page 27: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/27.jpg)
S1 S2 S3 S4
Unbalance DIM32
Data sets visualized Artificial
![Page 28: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/28.jpg)
Data sets visualized Artificial
G2-2-30 G2-2-50 G2-2-70
A1 A2 AA3
![Page 29: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/29.jpg)
Time complexity
![Page 30: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/30.jpg)
Efficiency of the random swap
Total time to find correct clustering:– Time per iteration
Number of iterations
Time complexity of single iteration:– Swap: O(1)– Remove cluster: 2kN/k = O(N)– Add cluster: 2N = O(N)– Centroids: 2N/k + 2N/k + 2
= O(N/k)
– K-means: IkN = O(IkN)
Bottleneck!
![Page 31: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/31.jpg)
Efficiency of the random swap
Total time to find correct clustering:– Time per iteration
Number of iterations
Time complexity of single iteration:– Swap: O(1)– Remove cluster: 2kN/k = O(N)– Add cluster: 2N = O(N)– Centroids: 2N/k + 2N/k + 2
= O(N/k)
– (Fast) K-means: 4N = O(N)
T. Kaukoranta, P. Fränti and O. Nevalainen"A fast exact GLA based on code vector activity detection"IEEE Trans. on Image Processing, 9 (8), 1337-1342, August 2000.
2 iterations only!
![Page 32: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/32.jpg)
Estimated and observed steps
N=4096, k=256, N/k=16, 8Bridge
![Page 33: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/33.jpg)
KM 1
KM 2
0
2
4
6
8
10
0 50 100 150 200 250 300 350 400 450
Tim
e (m
s)
Local repartition
Bridge
53 %
39 %
N = 4096k = 256 8
Processing time profile
![Page 34: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/34.jpg)
KM 1
KM 2
0
40
80
120
160
200
0 100 200 300 400 500 600 700 800 900
Tim
e (m
s)
Local repartition
Birch1
49 %
48 %
CI=0 reached
N = 100.000k = 100 5.5
50 %44 %
Processing time profile
![Page 35: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/35.jpg)
160
165
170
175
180
185
190
0.01 0.1 1 10 100Time (s)
MSE
Effect of K-means iterations
21
34
5
Bridge
![Page 36: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/36.jpg)
0
2
4
6
8
10
12
0.1 1 10 100Time (s)
MSE
Effect of K-means iterations
213
45
Birch2
![Page 37: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/37.jpg)
How many swaps?
![Page 38: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/38.jpg)
Three types of swaps
20.12 15.8720.09
Before swapBefore swap AcceptedAccepted SuccessfulSuccessful
CI=2 CI=2 CI=1
• Trial swap• Accepted swap• Successful swap
MSE improvesCI improves
![Page 39: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/39.jpg)
Accepted and successful swaps
![Page 40: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/40.jpg)
CI=4CI=4 CI=9CI=9
Number of swaps needed Example with 35 clusters
A3A3
![Page 41: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/41.jpg)
K-means clustering result(3 swaps needed)
Rem
ove
Remove
Added
Final clustering result
Number of swaps needed Example from image quantization
![Page 42: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/42.jpg)
0 %
5 %
10 %
15 %
20 %
0 10 20 30 40 50 60 70 80 90 100
Swaps
Max = 322
Average = 35
90% cases < 70
Median = 26
S1
Statistical observationsN=5000, k=15, d=2, N/k=333, 4.1
![Page 43: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/43.jpg)
N=5000, k=15, d=2, N/k=333, 4.8
0 %
5 %
10 %
15 %
20 %
0 10 20 30 40 50 60 70 80 90 100
Swaps
Max = 248Average = 25
90% cases < 50
Median = 18 S4
Statistical observations
![Page 44: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/44.jpg)
Theoretical estimation
![Page 45: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/45.jpg)
Probability of good swap
• Select a proper prototype to remove:– There are k clusters in total: premoval =1/k
• Select a proper new location:– There are N choices: padd =1/N– Only k are significantly different: padd =1/k
• Both happens same time:– k2 significantly different swaps.– Probability of each different swap is pswap =1/k2
– Open question: how many of these are good?
p = (/k)(/k) = O(/k)2
![Page 46: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/46.jpg)
• Probability of not finding good swap:
Expected number of iterations
2
2
1loglogk
Tq
2
2
1log
log
k
qT
• Estimated number of iterations:
T
kq
2
2
1
![Page 47: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/47.jpg)
0.000000001
0.00000001
0.0000001
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
0 50 100 150 200 250 300Iterations
q
Probability of failure (q) depending on T
![Page 48: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/48.jpg)
3.81.8
2.75.2
0.6
0.2 0.3
1.3
18 21 17 22
0.1
1
10
100
S1 S2 S3 S4Dataset
q
q =0.1%
q =1%
q =10%
S 1-S 4
Observed probability (%) of failN=5000, k=15, d=2, N/k=333, 4.5
![Page 49: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/49.jpg)
1.3 2.01.50.60.9
0.50.50
0.010.010.010.010.01
10.3 9.616.3 14.7 16.0 19.6
0.001
0.01
0.1
1
10
100
32 64 128 256 512 1024
Dimensions
q
q =0.1%
q =1%
q =10%
Dim 32-128
Observed probability (%) of failN=1024, k=16, d=32-128, N/k=64, 1.1
![Page 50: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/50.jpg)
2
2
ln -αkqT
2
2
2222 -ln /
ln -/1ln
ln αkq
kαq
kαqT
Bounds for the iterations
Upper limit:
Lower limit similarly; resulting in:
![Page 51: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/51.jpg)
Multiple swaps (w)
Probability for performing less than w swaps:
Expected number of iterations:
iTiw
i kkiT
q
2
2
2
21
01
2
2
2
2
1
log1ˆkwk
it
w
i
![Page 52: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/52.jpg)
Expected time complexity
1. Linear dependency on N2. Quadratic dependency on k
(With large number of clusters, it can be too slow)
3. Logarithmic dependency on w(Close to constant)
4. Inverse dependency on (Higher the dimensionality, faster the method)
αNkw-N
αkwkNt
2
2
2 log log,ˆ
![Page 53: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/53.jpg)
400
600
800
1000
1200
1400
1600
1800
1009080706050403020100Data size (thousands)
Itera
tions
Birch275%
25%N=100.000
Linear dependency on NN<100.000, k=100, d=2, N/k=1000, 3.1
![Page 54: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/54.jpg)
Quadratic dependency on kN<100.000, k=100, d=2, N/k=1000, 3.1
0
200
400
600
800
1000
1200
1400
1009080706050403020100Clusters
Itera
tions
Birch2
75%
25%
Quadratic fit
![Page 55: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/55.jpg)
Logarithmic dependency on w
![Page 56: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/56.jpg)
Theory vs. reality
![Page 57: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/57.jpg)
Neighborhood size
![Page 58: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/58.jpg)
How much is
?
Voronoi neighbors Neighbors by distance
1
2
3
1
4
2
3
6
5
2(3k-6)/k = 6 – 12/k 2kD/2/k = O(2k D/2-1)
2-dim:D-dim:
kUpper limit:
S1S1
![Page 59: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/59.jpg)
0 %
5 %
10 %
15 %
20 %
25 %
30 %
35 %
40 %
45 %
1 2 3 4 5 6 7 8 9
Number of neighbours
Freq
uenc
y
Average = 3.9
Observed number of neighbors Data set S2
![Page 60: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/60.jpg)
• Five iterations of random swap clustering
• Each pair of prototypes A and B:1. Calculate the half point HP = (A+B)/22. Find the nearest prototype C for HP3. If C=A or C=B they are potential neighbors.
• Analyze potential neighbors:1. Calculate all vector distances across A and B2. Select the nearest pair (a,b)3. If d(a,b) < min( d(a,C(a), d(b,C(b) ) then Accept
•
= Number of pairs found / k
Estimate
![Page 61: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/61.jpg)
Observed values of
![Page 62: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/62.jpg)
Optimality
![Page 63: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/63.jpg)
Multiple optima (=plateaus)• Very similar result (<0.3% diff. in MSE)• CI-values significantly high (9%)• Finds one of the near-optimal solutions
![Page 64: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/64.jpg)
Experiments
![Page 65: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/65.jpg)
Bridge
160
165
170
175
180
185
190
0 1 10 100 1000 10000Time
MSE
Random Swap
Repeated k-means(10.000 repeats)
510
40
50
3020
0
70
60
73 K-means(single)
61
(10k)(28)
(172)(146)
(13k)
(825)
(78k) (387k) (813k)
(365)
(3k)
Time-versus-distortionN=4096, k=256, d=16, N/k=16, 5.4
![Page 66: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/66.jpg)
Missa
5.00
5.25
5.50
5.75
6.00
6.25
6.50
0 1 10 100 1000 10000Time
MSE
Random Swap
Repeated k-means(10.000 repeats)
510
4050
3020
0
70
60
8093
88
K-means(single)
(10k)(136)(62)
(501)
(23k)
(1632)
(70k) (940k)(135k)
Time-versus-distortionN=6480, k=256, d=16, N/k=25, 17.1
![Page 67: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/67.jpg)
Birch1
4.50
5.00
5.50
6.00
6.50
7.00
0 1 10 100 1000 10000Time
MSE
Random Swap
Repeated k-means(2500 repeats)5
10
20
0
7
3
K-means(single)
(489)
(22)
(2)
(43)
(606)
Time-versus-distortionN=100.000, k=100, d=2, N/k=1000, 5.8
![Page 68: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/68.jpg)
Birch2
0
2
4
6
8
10
0 1 10 100 1000 10000Time
MSE
Random Swap
Repeated k-means(2500 repeats)
5
10
20
0
18K-means
(single)
(845) (2500)
9(41)
(2067)
(64)
(34)
(1)
Time-versus-distortionN=100.000, k=100, d=2, N/k=1000, 3.1
![Page 69: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/69.jpg)
Europe
0
2
4
6
8
10
12
14
16
1 10 100 1000 10000 100000 1000000Time
MSE
Random Swap
Repeated k-means(500 repeats)
51040 30 20 0
6041 39
K-means(single)
100
80
(576k)(58k)(305)
(11)
(24)
(44)(1888) (9k)
Time-versus-distortionN=169.673, k=256, d=2, N/k=663, 6.3
![Page 70: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/70.jpg)
Unbalance
0
20
40
60
80
100
120
140
0.01 0.1 1 10 100 1000Time
MSE
Randomswap
Repeated k-means
1
2
3
0
3K-means(single)
(116)
2
(98)
(43)
(20)
(4)
1
(3.219)
1
(20.000)
Time-versus-distortionN=6500, k=8, d=2, N/k=821, 2.3
![Page 71: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/71.jpg)
0 %
5 %
10 %
15 %
160 165 170 175 180 185MSE
164.69
Random swapBridge
179.82Repeated k-means
Variation of results
![Page 72: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/72.jpg)
0 %
5 %
10 %
15 %
4.0 4.5 5.0 5.5 6.0 6.5MSE
4.64
Random swapBirch1
5.48Repeated k-means
3
45 6 7
89 10 11
Variation of results
![Page 73: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/73.jpg)
• k-means (KM)
• k-means++• repeated k-means (RKM)• x-means• agglomerative clustering (AC)• random swap (RS)• global k-means
• genetic algorithm
Comparison of algorithms
D. Arthur and S. Vassilvitskii, "K-means++: the advantages of careful seeding", ACM-SIAM Symp. on Discrete Algorithms (SODA’07), New Orleans, LA, 1027-1035, January, 2007.
D. Pelleg, and A. Moore, "X-means: Extending k- means with efficient estimation of the number of clusters", Int. Conf. on Machine Learning, (ICML’00), Stanford, CA, USA, June 2000.
P. Fränti, T. Kaukoranta, D.-F. Shen and K.-S. Chang, "Fast and memory efficient implementation of the exact PNN", IEEE Trans. on Image Processing, 9 (5), 773- 777, May 2000.
A. Likas, N. Vlassis and J.J. Verbeek, "The global k-means clustering algorithm", Pattern Recognition 36, 451-461, 2003.
P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pat. Rec. Let., 21 (1), 61-68, January 2000.
![Page 74: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/74.jpg)
Processing time
![Page 75: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/75.jpg)
Clustering quality
![Page 76: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/76.jpg)
Conclusions
![Page 77: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/77.jpg)
What we learned?
1. Random swap is efficient algorithm
2. It does not converge to sub-optimal result
3. Expected processing has dependency:•Linear O(N) dependency on the size of data•Quadratic O(k2) on the number of clusters•Inverse O(1/) on the neighborhood size•Logarithmic O(log w) on the number of swaps
![Page 78: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/78.jpg)
References•
P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, 2018.
•
P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), 358-369, 2000.
•
P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139-1148, August 1998.
•
P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.
•
Pseudo code: http://cs.uef.fi/pages/franti/research/rs.txt
![Page 79: Random Swap algorithm - Itä-Suomen yliopistocs.uef.fi/pages/franti/cluster/RandomSwap.pdf · 2018. 4. 4. · 175 180 185 190 0 1 10 100 1000 10000 Time MSE Random Swap Repeated k-means](https://reader035.vdocument.in/reader035/viewer/2022071413/610acf9a7c113f2f164abdf3/html5/thumbnails/79.jpg)
Supporting material
Implementations available:(C, Matlab, Java, Javascript, R and Python)http://www.uef.fi/web/machine-learning/software
Interactive animation:http://cs.uef.fi/sipu/animator/
Clusterator:http://cs.uef.fi/sipu/clusterator