more pluses in the k-means++ m - anu college of...

26
www.data61.csiro.au k-variates++: more pluses in the k-means++ Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen DATA61 | ANU | TECHNION | ECOLE POLYTECHNIQUE | UNSW | SONY CS LABS, INC. (formerly NICTA) Poster #29, Mon. 3-7pm

Upload: truongcong

Post on 01-Sep-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

www.data61.csiro.au

k-variates++:moreplusesinthek-means++RichardNock,RaphaëlCanyasse,RoksanaBoreli,FrankNielsen

DATA61|ANU|TECHNION|ECOLEPOLYTECHNIQUE|UNSW|SONYCSLABS,INC.

(formerly NICTA)

Poster #29, Mon. 3-7pm

2

ICML 2016

k-variates

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

In this talk

❖ A generalization of the popular k-means++ seeding!

❖ Two theorems on k-variates++!

❖ guarantees on approximation of the global optimum!

❖ likelihood ratio bound between neighbouring instances!

❖ Applications: “reductions” between clustering algorithms + approximation bounds of new clustering algorithms, privacy

❖ A generalization of the popular k-means++ seeding!

❖ Two theorems on k-variates++!

❖ guarantees on approximation of the global optimum!

❖ likelihood ratio bound between neighbouring instances!

❖ Applications: “reductions” between clustering algorithms + approximation bounds of new clustering algorithms, privacy

3

ICML 2016

k-variates

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

In this talk

And more ! (see poster)

❖ A generalization of the popular k-means++ seeding!

❖ Two theorems on k-variates++!

❖ guarantees on approximation of the global optimum!

❖ likelihood ratio bound between neighbouring instances!

❖ Applications: “reductions” between clustering algorithms + approximation bounds of new clustering algorithms, privacy

4

ICML 2016

k-variates

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

In this talk

And more ! (see poster)!(see paper !)

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen5

❖ k-means++ seeding = a gold standard in clustering:!

❖ utterly simple to implement (iteratively pick centers squ. distance to previous centers) !

❖ assumption-free (expected) approximation guarantee wrt the k-means global optimum:

(Arthur & Vassilvitskii, SODA 2007)!

❖ Inspired many variants (tensor clustering, distributed, data stream, on-line, parallel clustering, clustering without centroids in closed form, etc.)

Motivation

ICML 2016

k-means++

distributed

on-line

streamed

no closed form centroid

tensors

more potentials

EC[potential] (2 + log k) · 8�opt

6

❖ Approaches are spawns of k-means++:!

❖ modify the algorithm (e.g. )!

❖ use it as building block!

❖ Our objective: !

❖ all in the same “bag”: a generalisation of k-means++ from which such approaches would be just “instanciations” reductions!

❖ Because general new applications

ICML 2016

distributed

on-line

streamed

no closed form centroid

more potentials

⇠k-variates

more applications

k-means++

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

Motivation

))

7

ICML 2016

k-means++

Input: data A ⇢ Rdwith |A| = m, k 2 N⇤;

Step 1: Initialise centers C ;;Step 2: for t = 1, 2, ..., k

2.1: randomly sample a ⇠qt A, with q1.= um and, for t > 1,

qt(a).= Dt(a)

X

a

02A

Dt(a0)

!�1

, where Dt(a).= min

x2Cka� xk22 ;

2.2: x a;

2.3: C C [ {x};Output: C;

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

Arthur&Vassilvitskii,SODA’07

8

ICML 2016

k-variates

Input: data A ⇢ Rdwith |A| = m, k 2 N⇤, random variables {X

a

,a 2 A},probe functions }t : A! Rd

(t � 1);

Step 1: Initialise centers C ;;Step 2: for t = 1, 2, ..., k

2.1: randomly sample a ⇠qt A, with q1.= um and, for t > 1,

qt(a).= Dt(a)

X

a

02A

Dt(a0)

!�1

, where Dt(a).= min

x2Ck}t(a)� xk22 ;

2.2: randomly sample x ⇠ Xa

;

2.3: C C [ {x};Output: C;

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

9

ICML 2016k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

Two theorems & applications

10

ICML 2016

Theorem 1❖ k-means potential for : , with !

❖ Suppose is -stretching: for any optimal cluster with size > 1 and any ,

❖ Then , with

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

c(a).= argmin

c2Cka� ck22�(A;C)

.=

Pa2A ka� c(a)k22C

}t ⌘ Aa0 2 A �(A;C)

�(A; {a0}) (1 + ⌘) · �(}t(A);C)

�(}t(A); {}t(a0)}), 8t

EC⇠k�variates++[�(A;C)] (2 + log k) · �

�var.=

X

a2A

tr (cov[Xa])

�bias

.=

X

a2A

kE[Xa]� copt

(a)k22

�opt

.=

X

a2A

ka� copt

(a)k22

�.= (6 + 4⌘)�

opt

+ 2�bias

+ 2�var

approximation of global

optimum

(� 0)

11

ICML 2016

Theorem 1❖ k-means potential for : , with !

❖ Suppose is -stretching: for any optimal cluster with size > 1 and any ,

❖ Then , with

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

c(a).= argmin

c2Cka� ck22�(A;C)

.=

Pa2A ka� c(a)k22C

}t ⌘ Aa0 2 A �(A;C)

�(A; {a0}) (1 + ⌘) · �(}t(A);C)

�(}t(A); {}t(a0)}), 8t

EC⇠k�variates++[�(A;C)] (2 + log k) · �

�var.=

X

a2A

tr (cov[Xa])

�bias

.=

X

a2A

kE[Xa]� copt

(a)k22

�opt

.=

X

a2A

ka� copt

(a)k22k-means++:!

• probe = Id!

• = Diracs

�.= (6 + 4⌘)�

opt

+ 2�bias

+ 2�var

approximation of global

optimum

X.

(� 0)

12

ICML 2016

Theorem 1❖ k-means potential for : , with !

❖ Suppose is -stretching: for any optimal cluster with size > 1 and any ,

❖ Then , with

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

c(a).= argmin

c2Cka� ck22�(A;C)

.=

Pa2A ka� c(a)k22C

}t ⌘ Aa0 2 A �(A;C)

�(A; {a0}) (1 + ⌘) · �(}t(A);C)

�(}t(A); {}t(a0)}), 8t

EC⇠k�variates++[�(A;C)] (2 + log k) · �

�var.=

X

a2A

tr (cov[Xa])

�bias

.=

X

a2A

kE[Xa]� copt

(a)k22

�opt

.=

X

a2A

ka� copt

(a)k22k-means++:

�b

i

a

s

= �o

p

t

�var = 0

) � = 8�o

p

t

⌘ = 0

�.= (6 + 4⌘)�

opt

+ 2�bias

+ 2�var

approximation of global

optimum

(� 0)

13

ICML 2016

Remarks❖ Guarantee approaches statistical lowerbound

(Fréchet-Cramér-Rao-Darmois)!

❖ Can be better than Arthur-Vassilvitskii bound, in particular if = knob from which background / domain knowledge may improve the general bound

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

�bias

< �opt

�bias

14

ICML 2016

Applications❖ Reductions from k-variates++ approximability ratios!

❖ pick clustering algorithm ,!

❖ show that expected output of = that of k-variates++ for particular choices of and (note: no computational constraint, just need existence)!

❖ Get approximability ratio for !

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

}t X.

L

L

L

)

15

ICML 2016

Summary (poster, paper)

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

}t X.LSetting Algorithm Probe functions DensitiesBatch k-means++ Identity Diracs

Distributed d-k-means++ Identity Uniform,support = subsets

Distributed p

+d-k-means++ Identity Non uniform,

compact supportStreaming s-k-means++ synopses DiracsOn-line ol-k-means++ point (batch not hit) Diracs

/ closest center (batch hit)

16

ICML 2016

Summary (poster, paper)

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

}t X.LSetting Algorithm Probe functions DensitiesBatch k-means++ Identity Diracs

Distributed d-k-means++ Identity Uniform,support = subsets

Distributed p

+d-k-means++ Identity Non uniform,

compact supportStreaming s-k-means++ synopses DiracsOn-line ol-k-means++ point (batch not hit) Diracs

/ closest center (batch hit)

17

ICML 2016

Distributed clustering❖ Setting: {data nodes = Forgy nodes} & special node

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

(F1,A1)

(F2,A2) (F3,A3)

(F4,A4) (F5,A5)

N⇤

k data points!

communicated

“Forgy nodes”“Sampling node”

no data!communicated

no data

non-uniformsampling

uniformsampling

data([iAi = A)

& &

e.g. hybrid, server-assisted

P2P networks!

(or Forgy node)

18

ICML 2016

Algorithm + Theorem❖ Algorithm: iterate for :!

❖ chooses (non-uniformly, ) Forgy node, say !

❖ samples (uniformly) point , sends to!

❖ computes & sends to , which updates !

❖ Theorem: , with and the spread of Forgy nodes!

❖ Remarks: global optimum on total data; bound gets all the better as Forgy nodes aggregate “local” data.

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

N⇤ ⇠ D⇤t Fi

t = 1, 2, ..., k

Fi at 2 Ai

Fj , 8j

at

dj 2 R+ N⇤ D⇤t

E[�(A,C)] (2 + log k) · � �.= 10�

opt

+ 6�Fs

�Fs

.=

Pi2[n]

Pa2Ai

kc(Ai)� ak22

�opt

Fj , 8j

d-k-means++

19

ICML 2016

Theorem 2❖ Assumption: , all satisfy (see e.g. differential privacy)

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

likelihood ratio bound

for neighbour samples

⌦.= Ball(L2, R)

(dpXa0 /dpXa)(x) %(R) , 8a,a0 2 A, 8x 2 ⌦

X.

20

ICML 2016

Theorem 2❖ Fix . For any neighbour (differ from 1),

❖ are spread and monotonicity parameters!

❖ They can be estimated / computed from data!

❖ In general, they with

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

likelihood ratio bound

for neighbour samples

A0 ⇡ A}t = Id(.)

PC⇠k�variates++[C|A0]PC⇠k�variates++[C|A] (1 + �w)k�1 + f(k) · �w · (1 + �s)

k�1 · %(R)

0 < �w, �s ⌧ 1

! 0 m (formal definition in

poster / paper)

21

ICML 2016

Theorem 2❖ Fix . For any neighbour (differ from 1),

❖ Conditions for & ?

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

likelihood ratio bound

for neighbour samples

A0 ⇡ A}t = Id(.)

PC⇠k�variates++[C|A0]PC⇠k�variates++[C|A] (1 + �w)k�1 + f(k) · �w · (1 + �s)

k�1 · %(R)

) 1 ) 0

22

ICML 2016

Theorem 2❖ Fix . For any neighbour (differ from 1),

❖ If densities of all are in , with prob. as long as !

❖ No , in bound (proof exhibits small values whp, experiments display such values). Application in differential privacy (sublinear noise !)

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

likelihood ratio bound

for neighbour samples

A0 ⇡ A}t = Id(.)

PC⇠k�variates++[C|A0]PC⇠k�variates++[C|A] (1 + �w)k�1 + f(k) · �w · (1 + �s)

k�1 · %(R)

[✏m, ✏M ] 63 0Xa

P[C|A0]

P[C|A] 1 +

✓✏M✏m

◆k

·

4

m14+

1d+1

+

✓64

k2d

◆k

· %(2R)

m

!� 1� �

k ✏m�2

4✏M·pm

| {z }o(1)

�w�s

23

ICML 2016

Experiments❖ k-variates++ ( ) vs k-means++ & k-means

(Bahmani & al. 2012), simulated data, d=50, sample peers with until . For each peer, (a) data uniformly sampled in an hyper rectangle + (b) p% of points given to a random peer (increases , problem more “difficult”)

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

kd-k-means++

0

5

10

15

20

25

0 10 20 30 40 50

Pi |Ai| ⇡ 20000E[|Ai|] = 500

�Fs

p

�Fs (p)

�Fs (0)

24

ICML 2016

Experiments❖ k-variates++ ( ) vs k-means++ & k-means

(Bahmani & al. 2012) (used with their best parameters)

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

4 5 6 7 8 9 10k 0

10 20

30 40

50

p

-4-2 0 2 4 6 8

lq

4 5 6 7 8 9 10k 0

10 20

30 40

50

p

-4-2 0 2 4 6 8

k

⇢�(H).= �(d-k-means++)��(H)

�(H) · 100

⇢�(k-means++) ⇢�(k-meansk)

d-k-means++

k-variates++

beats k-means++

25

ICML 2016

Conclusions❖ We provide a generalisation of k-means++ with

guaranteed approximation of the global optimum!

❖ k-variates++ can be used as is (e.g. privacy, k-means++) or to prove approximation properties of other algorithms via “reductions” between clustering algorithms

❖ Come see the poster for more examples!

❖ Future: use Theorems to address stability, generalisation and smoothed analysis

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

26

ICML 2016

Thank you !

k-variates++:moreplusesinthek-means++|RichardNock,RaphaelCanyasse,RoksanaBoreli&FrankNielsen

k-variates

Questions ?