david m. blei...• dendrogram: plot each merge at the (negative) similarity between the two merged...
Post on 31-Dec-2019
4 Views
Preview:
TRANSCRIPT
Hierarchical clustering
David M. Blei
COS424Princeton University
February 28, 2008
D. Blei Clustering 02 1 / 21
Hierarchical clustering
• Hierarchical clustering is a widely used data analysis tool.
• The idea is to build a binary tree of the data that successivelymerges similar groups of points
• Visualizing this tree provides a useful summary of the data
D. Blei Clustering 02 2 / 21
Hierarchical clustering
• Hierarchical clustering is a widely used data analysis tool.
• The idea is to build a binary tree of the data that successivelymerges similar groups of points
• Visualizing this tree provides a useful summary of the data
D. Blei Clustering 02 2 / 21
Hierarchical clustering
• Hierarchical clustering is a widely used data analysis tool.
• The idea is to build a binary tree of the data that successivelymerges similar groups of points
• Visualizing this tree provides a useful summary of the data
D. Blei Clustering 02 2 / 21
Hierarchical clusering vs. k-means
• Recall that k-means or k-medoids requires
• A number of clusters k• An initial assignment of data to clusters• A distance measure between data d(xn, xm)
• Hierarchical clustering only requires a measure of similarity betweengroups of data points.
D. Blei Clustering 02 3 / 21
Hierarchical clusering vs. k-means
• Recall that k-means or k-medoids requires
• A number of clusters k
• An initial assignment of data to clusters• A distance measure between data d(xn, xm)
• Hierarchical clustering only requires a measure of similarity betweengroups of data points.
D. Blei Clustering 02 3 / 21
Hierarchical clusering vs. k-means
• Recall that k-means or k-medoids requires
• A number of clusters k• An initial assignment of data to clusters
• A distance measure between data d(xn, xm)
• Hierarchical clustering only requires a measure of similarity betweengroups of data points.
D. Blei Clustering 02 3 / 21
Hierarchical clusering vs. k-means
• Recall that k-means or k-medoids requires
• A number of clusters k• An initial assignment of data to clusters• A distance measure between data d(xn, xm)
• Hierarchical clustering only requires a measure of similarity betweengroups of data points.
D. Blei Clustering 02 3 / 21
Hierarchical clusering vs. k-means
• Recall that k-means or k-medoids requires
• A number of clusters k• An initial assignment of data to clusters• A distance measure between data d(xn, xm)
• Hierarchical clustering only requires a measure of similarity betweengroups of data points.
D. Blei Clustering 02 3 / 21
Agglomerative clustering
• We will talk about agglomerative clustering.
• Algorithm:
1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster
D. Blei Clustering 02 4 / 21
Agglomerative clustering
• We will talk about agglomerative clustering.
• Algorithm:
1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster
D. Blei Clustering 02 4 / 21
Agglomerative clustering
• We will talk about agglomerative clustering.
• Algorithm:
1 Place each data point into its own singleton group
2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster
D. Blei Clustering 02 4 / 21
Agglomerative clustering
• We will talk about agglomerative clustering.
• Algorithm:
1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups
3 Until: all the data are merged into a single cluster
D. Blei Clustering 02 4 / 21
Agglomerative clustering
• We will talk about agglomerative clustering.
• Algorithm:
1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster
D. Blei Clustering 02 4 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
Data
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 001
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 002
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 003
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 004
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 005
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 006
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 007
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 008
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 009
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 010
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 011
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 012
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 013
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 014
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 015
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 016
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 017
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 018
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 019
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 020
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 021
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 022
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 023
V1
V2
D. Blei Clustering 02 5 / 21
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 024
V1
V2
D. Blei Clustering 02 5 / 21
Agglomerative clustering
• Each level of the resulting tree is a segmentation of the data
• The algorithm results in a sequence of groupings
• It is up to the user to choose a ”natural” clustering from thissequence
D. Blei Clustering 02 6 / 21
Agglomerative clustering
• Each level of the resulting tree is a segmentation of the data
• The algorithm results in a sequence of groupings
• It is up to the user to choose a ”natural” clustering from thissequence
D. Blei Clustering 02 6 / 21
Agglomerative clustering
• Each level of the resulting tree is a segmentation of the data
• The algorithm results in a sequence of groupings
• It is up to the user to choose a ”natural” clustering from thissequence
D. Blei Clustering 02 6 / 21
Dendrogram
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasingwith the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering ispopular
D. Blei Clustering 02 7 / 21
Dendrogram
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasingwith the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering ispopular
D. Blei Clustering 02 7 / 21
Dendrogram
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasingwith the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering ispopular
D. Blei Clustering 02 7 / 21
Dendrogram
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasingwith the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering ispopular
D. Blei Clustering 02 7 / 21
Dendrogram
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasingwith the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering ispopular
D. Blei Clustering 02 7 / 21
Dendrogram of example data
2904
2432
2641
2489
2278
2905
2085
2959 27
4327
9723
1422
8224
2520
2414
5517
2316
2216
16 244
851
220
81 184 25
247
7
020
4060
8010
012
0
Cluster Dendrogram
hclust (*, "complete")dist(x)
Hei
ght
Groups that merge at high values relative to the merger values of theirsubgroups are candidates for natural clusters. (Tibshirani et al., 2001)
D. Blei Clustering 02 8 / 21
Group similarity
• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.
• Three most popular choices
• Single-linkage: the similarity of the closest pair
dSL(G ,H) = mini∈G ,j∈H
di ,j
• Complete linkage: the similarity of the furthest pair
dCL(G ,H) = maxi∈G ,j∈H
di ,j
• Group average: the average similarity between groups
dGA =1
NGNH
∑i∈G
∑j∈H
di ,j
D. Blei Clustering 02 9 / 21
Group similarity
• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.
• Three most popular choices
• Single-linkage: the similarity of the closest pair
dSL(G ,H) = mini∈G ,j∈H
di ,j
• Complete linkage: the similarity of the furthest pair
dCL(G ,H) = maxi∈G ,j∈H
di ,j
• Group average: the average similarity between groups
dGA =1
NGNH
∑i∈G
∑j∈H
di ,j
D. Blei Clustering 02 9 / 21
Group similarity
• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.
• Three most popular choices
• Single-linkage: the similarity of the closest pair
dSL(G ,H) = mini∈G ,j∈H
di ,j
• Complete linkage: the similarity of the furthest pair
dCL(G ,H) = maxi∈G ,j∈H
di ,j
• Group average: the average similarity between groups
dGA =1
NGNH
∑i∈G
∑j∈H
di ,j
D. Blei Clustering 02 9 / 21
Group similarity
• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.
• Three most popular choices
• Single-linkage: the similarity of the closest pair
dSL(G ,H) = mini∈G ,j∈H
di ,j
• Complete linkage: the similarity of the furthest pair
dCL(G ,H) = maxi∈G ,j∈H
di ,j
• Group average: the average similarity between groups
dGA =1
NGNH
∑i∈G
∑j∈H
di ,j
D. Blei Clustering 02 9 / 21
Group similarity
• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.
• Three most popular choices
• Single-linkage: the similarity of the closest pair
dSL(G ,H) = mini∈G ,j∈H
di ,j
• Complete linkage: the similarity of the furthest pair
dCL(G ,H) = maxi∈G ,j∈H
di ,j
• Group average: the average similarity between groups
dGA =1
NGNH
∑i∈G
∑j∈H
di ,j
D. Blei Clustering 02 9 / 21
Properties of intergroup similarity
• Single linkage can produce “chaining,” where a sequence of closeobservations in different groups cause early merges of those groups
• Complete linkage has the opposite problem. It might not mergeclose groups because of outlier members that are far apart.
• Group average represents a natural compromise, but depends on thescale of the similarities. Applying a monotone transformation to thesimilarities can change the results.
D. Blei Clustering 02 10 / 21
Properties of intergroup similarity
• Single linkage can produce “chaining,” where a sequence of closeobservations in different groups cause early merges of those groups
• Complete linkage has the opposite problem. It might not mergeclose groups because of outlier members that are far apart.
• Group average represents a natural compromise, but depends on thescale of the similarities. Applying a monotone transformation to thesimilarities can change the results.
D. Blei Clustering 02 10 / 21
Properties of intergroup similarity
• Single linkage can produce “chaining,” where a sequence of closeobservations in different groups cause early merges of those groups
• Complete linkage has the opposite problem. It might not mergeclose groups because of outlier members that are far apart.
• Group average represents a natural compromise, but depends on thescale of the similarities. Applying a monotone transformation to thesimilarities can change the results.
D. Blei Clustering 02 10 / 21
Caveats
• Hierarchical clustering should be treated with caution.
• Different decisions about group similarities can lead to vastlydifferent dendrograms.
• The algorithm imposes a hierarchical structure on the data, evendata for which such structure is not appropriate.
D. Blei Clustering 02 11 / 21
Caveats
• Hierarchical clustering should be treated with caution.
• Different decisions about group similarities can lead to vastlydifferent dendrograms.
• The algorithm imposes a hierarchical structure on the data, evendata for which such structure is not appropriate.
D. Blei Clustering 02 11 / 21
Caveats
• Hierarchical clustering should be treated with caution.
• Different decisions about group similarities can lead to vastlydifferent dendrograms.
• The algorithm imposes a hierarchical structure on the data, evendata for which such structure is not appropriate.
D. Blei Clustering 02 11 / 21
Examples
• “Repeated Observation of Breast Tumor Subtypes in IndependentGene Expression Data Sets” (Sorlie et al., 2003)
• Hierarchical clustering of gene expression data lead to new theories
• Later, theories tested in the lab.
D. Blei Clustering 02 12 / 21
Examples
• “Repeated Observation of Breast Tumor Subtypes in IndependentGene Expression Data Sets” (Sorlie et al., 2003)
• Hierarchical clustering of gene expression data lead to new theories
• Later, theories tested in the lab.
D. Blei Clustering 02 12 / 21
Examples
• “Repeated Observation of Breast Tumor Subtypes in IndependentGene Expression Data Sets” (Sorlie et al., 2003)
• Hierarchical clustering of gene expression data lead to new theories
• Later, theories tested in the lab.
D. Blei Clustering 02 12 / 21
D. Blei Clustering 02 13 / 21
Examples
• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)
• Roger de Piles rated 57 paintings along different dimensions.
• These authors cluster them using different methods, includinghierarchical clustering
• They discuss the different clusters. (They are art critics.)
D. Blei Clustering 02 14 / 21
Examples
• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)
• Roger de Piles rated 57 paintings along different dimensions.
• These authors cluster them using different methods, includinghierarchical clustering
• They discuss the different clusters. (They are art critics.)
D. Blei Clustering 02 14 / 21
Examples
• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)
• Roger de Piles rated 57 paintings along different dimensions.
• These authors cluster them using different methods, includinghierarchical clustering
• They discuss the different clusters. (They are art critics.)
D. Blei Clustering 02 14 / 21
Examples
• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)
• Roger de Piles rated 57 paintings along different dimensions.
• These authors cluster them using different methods, includinghierarchical clustering
• They discuss the different clusters. (They are art critics.)
D. Blei Clustering 02 14 / 21
Good: They are cautious. “The value of this analysis...will depend onany interesting speculation it may provoke.”
D. Blei Clustering 02 15 / 21
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores• funding• evaluations
D. Blei Clustering 02 16 / 21
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores• funding• evaluations
D. Blei Clustering 02 16 / 21
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores• funding• evaluations
D. Blei Clustering 02 16 / 21
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments
• entry scores• funding• evaluations
D. Blei Clustering 02 16 / 21
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores
• funding• evaluations
D. Blei Clustering 02 16 / 21
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores• funding
• evaluations
D. Blei Clustering 02 16 / 21
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores• funding• evaluations
D. Blei Clustering 02 16 / 21
One dendrogram
D. Blei Clustering 02 17 / 21
Another dendrogram—notice that it’s different
D. Blei Clustering 02 18 / 21
• Split values: They notice that there’s no kink and conclude thatthere is no cluster structure in Austrailian universities.
• Good: Cautious interpretation of clustering, analysis of clusteringbased on multiple subsets of the features.
• Bad: Their conclusions—we can’t cluster Australianuniversities—ignores all the algorithmic choices that were made.
D. Blei Clustering 02 19 / 21
Examples
• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)
• Data: weekly rates of return for stocks in twelve countries
• Run agglometerative clustering year by year
• Interpret the structure and examine stability over different timeperiods
D. Blei Clustering 02 20 / 21
Examples
• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)
• Data: weekly rates of return for stocks in twelve countries
• Run agglometerative clustering year by year
• Interpret the structure and examine stability over different timeperiods
D. Blei Clustering 02 20 / 21
Examples
• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)
• Data: weekly rates of return for stocks in twelve countries
• Run agglometerative clustering year by year
• Interpret the structure and examine stability over different timeperiods
D. Blei Clustering 02 20 / 21
Examples
• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)
• Data: weekly rates of return for stocks in twelve countries
• Run agglometerative clustering year by year
• Interpret the structure and examine stability over different timeperiods
D. Blei Clustering 02 20 / 21
Examples
Good: Cautious. “This study is only descriptive...A logical subsequentresearch area is to explain observed structural properties and the causesof structural change.”
D. Blei Clustering 02 21 / 21
top related