an impossibility theorem for clustering by jon kleinberg

Post on 28-Dec-2015

253 Views

Category:

Documents

10 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Impossibility Theorem for Clustering

By Jon Kleinberg

Definitions Clustering function: operates on a set S of

more than 2 points and the distances among them

where is a partition of S Distance function:

the distance is 0 only for d(i,i) Does not require the triangle inequality.

RSSd :

),( dSf

Many different clustering criteria

k-center k-median k-means Inter-Intra etc

k-Center

Minimize maximum distance

k-median

Minimize average distance

k-means: minimize distance squared

Inter-Intra

T(C)

D(C)

Maximize D(C) – T(C)

Motivation

Each criterion optimizes different features

Is there one clustering criterion with phenomenal cosmic powers?

Method

Give three intuitive axioms that any criterion should satisfy

Surprise: Not possible to satisfy all three

Reminiscent of Arrow’s Impossibility theorem: ranking is impossible

Axiom 1 – Scale-Invariance For any distance function d and any β >0 we have

that f(S,d)=f(S,βd)

Axiom 2 - Richness Range(f) is equal to all partitions of S

i.e. All possible clusterings can be generated given the right distances

Axiom 3 - Consistency Let d and d’ be two distance functions. If

f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=

d(i,j)

d(i,j)d’(i,j)

d’(i,j)

Definition

Anti-chain: A collection of partitions is an anti-chain if it does not contain two distinct partitions such that one is a refinement of the other

Anti-Chains can not satisfy Richness

Main Result For each , there is no clustering

function f that satisfies Scale-Invariance, Richness and Consistency

Implied by proof that if f satisfies Scale-Invariance and Consistency, then Range(f) is an anti-chain

2n

Reminder of Axioms Scale-Invariance: For any distance

function d and any β >0 we have that f(d)=f(β d)

Richness: Range(f) is equal to all partitions of S

Consistency: Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=

Single Linkage

Cluster by combining the closest points

0 1 4 9 10 12 15 19 20

Any two axioms For every pair of axioms, there is a

stopping condition for single linkage

Consistency + Richness: only link if distance is less than r

Consistency + SI: stop when you have k connected components

Richness + SI: if x is the diameter of the graph, only add edges with weight βx

Centroid-Based Clustering (k,g)-centroid clustering function: Choose

T, a set of k centroid points such that is minimized

If g is identity, we get k-median, etc.

Result: For every and every function g and n significantly larger than k the (k,g)-centroid clustering function does not satisfy consistency.

)),(( TidgSi

2k

Proof: A contradiction

r

r+δ

ε

X (size m)Y (size λm)

)()()),(( mgrmgTidg

A new distance function

r’r+δ

ε

Y (size λm)

)()'()),(( rmgrmgTidg

X0 (size m/2)

r’

r

r+δ

X1 (size m/2)

r’ < r

Wrapping Up If we pick λ, r, r’, ε and δ right then we can

have:

But then our new centers are in X0 and X1

But our new distance followed consistency, so it should give us X and Y.

This covers the case where k is 2.

)()'()()( rmgrmgmgrmg

Discussion: Relaxing Axioms Refinement-consistency: if d’ is an f(d)-

transformation of d, then f(d’) is a refinement of f(d) Near-Richness: all partitions except the trivial

one can be obtained

These together allow a function that satisfies these replacements.

What other relaxations could we have?

Discussion Does this mean there is a law of continuous

employment for clustering criterion creators?

Is the clustering function properly defined? Allow overlaps Allow outliers

Are these the right axioms? All partitions possible vs. power set

Axioms for graph clustering?

Questions?

top related