reza bosagh zadeh (carnegie mellon) shai ben-david (waterloo) uai 09, montréal, june 2009 a...
TRANSCRIPT
![Page 1: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/1.jpg)
Reza Bosagh Zadeh (Carnegie Mellon)
Shai Ben-David (Waterloo)
UAI 09, Montréal, June 2009
A UNIQUENESS THEOREMFOR CLUSTERING
![Page 2: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/2.jpg)
TALK OUTLINE
Questions being addressed
Introduce Axioms
Introduce Properties
Uniqueness Theorem for Single-Linkage
Taxonomy of Partitioning Functions
Bosagh Zadeh, Ben-David, UAI 2009
![Page 3: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/3.jpg)
WHAT IS CLUSTERING?
Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.
40 45 50 55
74
76
78
80
82
84
Bosagh Zadeh, Ben-David, UAI 2009
![Page 4: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/4.jpg)
SOME BASIC UNANSWERED QUESTIONS
Are there principles governing all clustering paradigms?
Which clustering paradigm should I use for a given task?
Bosagh Zadeh, Ben-David, UAI 2009
![Page 5: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/5.jpg)
“Clustering” is an ill-defined problem
There are many different clustering tasks, leading to different clustering paradigms:
THERE ARE MANY CLUSTERING TASKS
Bosagh Zadeh, Ben-David, UAI 2009
![Page 6: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/6.jpg)
“Clustering” is an ill-defined problem
There are many different clustering tasks, leading to different clustering paradigms:
THERE ARE MANY CLUSTERING TASKS
Bosagh Zadeh, Ben-David, UAI 2009
![Page 7: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/7.jpg)
WE WOULD LIKE TO DISCUSS THE BROAD NOTION OF CLUSTERING
Independently of any particular algorithm, particular objective function, or particular generative data model
Bosagh Zadeh, Ben-David, UAI 2009
![Page 8: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/8.jpg)
WHAT FOR?
Choosing a suitable algorithm for a given task.
Axioms: to capture intuition about clustering in general.
Expected to be satisfied by all clustering functions
Properties: to capture differences between different clustering paradigms
Bosagh Zadeh, Ben-David, UAI 2009
![Page 9: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/9.jpg)
TIMELINE
Jardine, Sibson 1971 Considered only hierarchical functions
Kleinberg 2003 Presented an impossibility result
This paper Presents a uniqueness result for Single-Linkage
Bosagh Zadeh, Ben-David, UAI 2009
![Page 10: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/10.jpg)
THE BASIC SETTING
For a finite domain set SS, a distance function is a symmetric mapping
d:SxSd:SxS → RR++ such that d(x,y)=0d(x,y)=0 iff x=yx=y.
A partitioning function takes a dissimilarity function on SS and returns a partition of SS.
We wish to define the axioms that distinguish clustering functions, from any other functions that output domain partitions.
Bosagh Zadeh, Ben-David, UAI 2009
![Page 11: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/11.jpg)
KLEINBERG’S AXIOMS Scale Invariance F(F(λλd)=F(d)d)=F(d) for all d d and all strictly positive
λλ.
RichnessThe range of F(d)F(d) over all dd is the set of all possible partitionings
ConsistencyIf d’d’ equals dd except for shrinking
distances within clusters of F(d)F(d) or stretching between-cluster distances,
then F(d) = F(d’).F(d) = F(d’).
Bosagh Zadeh, Ben-David, UAI 2009
![Page 12: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/12.jpg)
KLEINBERG’S AXIOMS Scale Invariance F(F(λλd)=F(d)d)=F(d) for all d d and all strictly positive
λλ.
RichnessThe range of F(d)F(d) over all dd is the set of all possible partitionings
ConsistencyIf d’d’ equals dd except for shrinking
distances within clusters of F(d)F(d) or stretching between-cluster distances,
then F(d) = F(d’).F(d) = F(d’).Inconsistent! No algorithm can satisfy all 3 of these.
Bosagh Zadeh, Ben-David, UAI 2009
![Page 13: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/13.jpg)
CONSISTENT AXIOMS: Scale Invariance F(F(λλd, k)=F(d, k)d, k)=F(d, k) for all d d and all strictly positive
λλ.
k-RichnessThe range of F(d, k)F(d, k) over all dd is the set of all possible k-k-partitionings
ConsistencyIf d’d’ equals dd except for shrinking distances
within clusters of F(d, k)F(d, k) or stretching between-cluster distances,
then F(d, k)=F(d’, k).F(d, k)=F(d’, k).Consistent! (And satisfied by Single-Linkage, Min-Sum, …)
Fix kk
Bosagh Zadeh, Ben-David, UAI 2009
![Page 14: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/14.jpg)
Definition. Call any partitioning function which satisfies
a Clustering Function
Scale Invariancek-RichnessConsistency
CLUSTERING FUNCTIONS
Bosagh Zadeh, Ben-David, UAI 2009
![Page 15: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/15.jpg)
TWO CLUSTERING FUNCTIONS
Single-Linkage
1. Start with with all points in their own cluster
2. While there are more than k clusters Merge the two most
similar clusters
Similarity between two clusters is the similarity of the most similar two
points from differing clusters
Min-Sum k-Clustering
Find the k-partitioning Γ which minimizes
(Is NP-Hard to optimize)
Scale Invariancek-RichnessConsistency
Both Functions satisfy:
Hierarchical
Not Hierarchical
Proofs in paper.
Bosagh Zadeh, Ben-David, UAI 2009
![Page 16: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/16.jpg)
CLUSTERING FUNCTIONS
Single-Linkage and Min-Sum are both Clustering functions.
How to distinguish between them in an Axiomatic framework? Use Properties
Not all properties are desired in every clustering situation: pick and choose properties for your task
Bosagh Zadeh, Ben-David, UAI 2009
![Page 17: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/17.jpg)
PROPERTIES - ORDER-CONSISTENCY
Order-ConsistencyIf two datasets dd and d’d’ have the same
ordering of the distances, then for all kk, F(d, k)=F(d’, k)F(d, k)=F(d’, k)
Bosagh Zadeh, Ben-David, UAI 2009
o In other words the clustering function only cares about whether a pair of points are closer/further than another pair of points.
o Satisfied by Single-Linkage, Max-Linkage, Average-Linkage…
o NOT satisfied by most objective functions (Min-Sum, k-means, …)
![Page 18: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/18.jpg)
PATH-DISTANCE
In other words, we find the path from x to y, which has the smallest longest jump in it.
1 2
7
14
e.g.
Pd( , ) = 2
Since the path from above has a jump of distance 2
Undrawn edges are large
Bosagh Zadeh, Ben-David, UAI 2009
![Page 19: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/19.jpg)
PATH-DISTANCE
Imagine each point is an island, and we would like to go from island a to island b. As if we’re trying to cross a river by jumping on rocks.
Being human, we are restricted in how far we can jump from island to island.
Path-Distance would have us find the path with the smallest longest jump, ensuring that we could complete all the jumps successfully.
Bosagh Zadeh, Ben-David, UAI 2009
![Page 20: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/20.jpg)
PROPERTIES – PATH-DISTANCE COHERENCE
Path-Distance CoherenceIf two datasets dd and d’d’ have the same
induced path distance then for all kk, F(d, k)=F(d’, k)F(d, k)=F(d’, k)
Bosagh Zadeh, Ben-David, UAI 2009
![Page 21: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/21.jpg)
UNIQUENESS THEOREM
Theorem (This work) Single-Linkage is the only clustering function
satisfying Order-Consistency and Path Distance-Coherence
Bosagh Zadeh, Ben-David, UAI 2009
![Page 22: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/22.jpg)
UNIQUENESS THEOREM
Theorem (This work) Single-Linkage is the only clustering function satisfying
Order-Consistency and Path-Distance-Coherence
Is Path-Distance-Coherence doing all the work? No. Consistency is necessary for uniqueness k-Richness is necessary
“X is Necessary”: All other axioms/properties satisfied, just X missing, still not enough to get uniqueness
Bosagh Zadeh, Ben-David, UAI 2009
![Page 23: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/23.jpg)
PRACTICAL CONSIDERATIONS
Single-Linkage is not always the right function to use. Because Path-Distance-Coherence is not always
desirable.
It’s not always immediately obvious when we want a function to focus on the Path Distance
Introduce a different formulation involving Minimum Spanning Trees
Bosagh Zadeh, Ben-David, UAI 2009
![Page 24: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/24.jpg)
2020
PROPERTIES - MST-COHERENCE
F
If
Then
MST-CoherenceIf two datasets dd and d’d’ have the same
Minimum Spanning Tree then for all kk, F(d, k)=F(d’, k)F(d, k)=F(d’, k)
, 2
F, 220
20 , 2
Bosagh Zadeh, Ben-David, UAI 2009
![Page 25: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/25.jpg)
A TAXONOMY OF CLUSTERING FUNCTIONS
Min-Sum satisfies neither MST-Coherence nor Order-Consistency
Future work: Characterize other clustering functions
Bosagh Zadeh, Ben-David, UAI 2009
![Page 26: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfb61a28abf838c9e60a/html5/thumbnails/26.jpg)
THANKS FOR YOUR ATTENTION!
Bosagh Zadeh, Ben-David, UAI 2009