4. ad-hoc i: hierarchical clustering hierarchical versus flat

15
4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical methods generate a hierarchy of partitions, i.e. a partition P 1 into 1 clusters (the entire collection) a partition P 2 into 2 clusters a partition P into n clusters (each object

Upload: porter-christensen

Post on 31-Dec-2015

42 views

Category:

Documents


0 download

DESCRIPTION

4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical methods generate a hierarchy of partitions, i.e. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

4. Ad-hoc I: Hierarchical clusteringHierarchical versus Flat

Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time.

Hierarchical methods generate a hierarchy of partitions, i.e.

• a partition P1 into 1 clusters (the entire collection)

• a partition P2 into 2 clusters

• a partition Pn into n clusters (each object forms its own cluster)

It is then up to the user to decide which of the partitions reflects actual

sub-populations in the data.

Page 2: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

Note: A sequence of partitions is called "hierarchical" if each cluster in a given partition is the union of clusters in the next larger partition.

P4 P3 P2 P1

Top: hierarchical sequence of partitionsBottom: non hierarchical sequence

Page 3: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

Hierarchical methods again come in two varieties, agglomerative and divisive.

Agglomerative methods:

• Start with partition Pn, where each object forms its own cluster.

• Merge the two closest clusters, obtaining Pn-1.

• Repeat merge until only one cluster is left.

Divisive methods

• Start with P1.

• Split the collection into two clusters that are as homogenous (and as different from each other) as possible.

• Apply splitting procedure recursively to the clusters.

Page 4: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

Note:

Agglomerative methods require a rule to decide which clusters to merge. Typically one defines a distance between clusters and then merges the two clusters that are closest.

Divisive methods require a rule for splitting a cluster.

Page 5: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

4.1 Hierarchical agglomerative clusteringNeed to define a distance d(P,Q) between groups, given a distance measure d(x,y) between observations.

Commonly used distance measures:

1. d1(P,Q) = min d(x,y), for x in P, y in Q ( single linkage )

2. d2(P,Q) = ave d(x,y), for x in P, y in Q ( average linkage )

3. d3(P,Q) = max d(x,y), for x in P, y in Q ( complete linkage )

4. ( centroid method )

5. ( Ward’s method )

d P Q x xP Q4 ( , )

d P QP Q

P Qx xP Q5

2

2( , )

d5 is called Ward’s distance.

Page 6: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

Motivation for Ward’s distance:

• Let Pk = P1 ,…, Pk be a partition of the observations into k groups.

• Measure goodness of a partition by the sum of squared distances of observations from their cluster means:

RSS x xk j Pj Pi

k

i

i

( )P

1

2

• Consider all possible (k-1)-partitions obtainable from Pk by a merge

• Merging two clusters with smallest Ward’s distance optimizes goodness of new partition.

Page 7: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

4.2 Hierarchical divisive clusteringThere are divisive versions of single linkage, average linkage, and Ward’s method.

Divisive version of single linkage:

• Compute minimal spanning tree (graph connecting all the objects with smallest total edge length.

• Break longest edge to obtain 2 subtrees, and a corresponding partition of the objects.

• Apply process recursively to the subtrees.

Agglomerative and divisive versions of single linkage give identical results (more later).

Page 8: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

Divisive version of Ward’s method.

Given cluster R.

Need to find split of R into 2 groups P,Q to minimize

RSS P Q i Pi P

j Qj Q

( , ) x x x x

2 2

or, equivalently, to maximize Ward’s distance between P and Q.

Note: No computationally feasible method to find optimal P, Q for large |R|. Have to use approximation.

Page 9: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

Iterative algorithm to search for the optimal Ward’s split

Project observations in R on largest principal component.

Split at median to obtain initial clusters P, Q.

Repeat {

Assign each observation to cluster with closest mean

Re-compute cluster means

} Until convergence

Note:

• Each step reduces RSS(P, Q)

• No guarantee to find optimal partition.

Page 10: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

Divisive version of average linkage

Algorithm Diana, Struyf, Hubert, and Rousseuw, pp. 22

Page 11: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

4.3 DendogramsResult of hierarchical clustering can be represented as binary tree:

• Root of tree represents entire collection

• Terminal nodes represent observations

• Each interior node represents a cluster

• Each subtree represents a partition

Note: The tree defines many more partitions than the n-2 nontrivial ones constructed during the merge (or split) process.

Note: For HAC methods, the merge order defines a sequence of n subtrees of the full tree. For HDC methods a sequence of subtrees can be defined if there is a figure of merit for each split.

Page 12: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

If distance between daughter clusters is monotonically increasing as we move up the tree, we can draw dendogram: y-coordinate of vertex = distance between daughter clusters.

Point set and corresponding single linkage dendogram

x[,1]

x[,2

]

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

1

2

3

4

Observations

1 2

3 4

0.5

1.0

1.5

2.0

2.5

Single linkage dendogram

Page 13: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

Standard method to extract clusters from a dendogram: • Pick number of clusters k.• Cut dendogram at a level that results in k subtrees.

Page 14: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

4.4 Experiment

Try hierarchical method on unimodal 2D datasets.

Experiments suggest: • Except in completely clear-cut situations, tree cutting (“cutree”) is useless for extracting clusters from a dendogram.• Complete linkage fails completely for elongated clusters.

Page 15: 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

Needed:

• Diagnostics to decide whether the daughters of a dendogram node really correspond to spatially separated clusters.• Automatic and manual methods for dendogram pruning.• Methods for assigning observations in pruned subtrees to clusters.