lecture'6:' hierarchical'clustering;'...
TRANSCRIPT
-
Lecture'6:'Hierarchical'Clustering;'Spectral'Clustering'
Stats'306B:'Unsupervised'Learning'
Lester'Mackey'
April'16,'2014''
-
Blackboard'discussion'! See'lecture'notes'
-
Average'linkage'agglomeraGve'clustering'! Example'behavior'in'2D,'Courtesy:'Dave'Blei'
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80Data
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 001
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 002
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 003
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 004
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 005
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 006
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 007
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 008
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 009
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 010
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 011
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 012
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 013
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 014
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 015
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 016
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 017
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 018
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 019
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 020
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 021
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 022
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 023
V1
V2
D. Blei Clustering 02 5 / 21
-
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80iteration 024
V1
V2
D. Blei Clustering 02 5 / 21
-
Clustering'human'tumor'microarray'data'
Dendrogram'from'agglomeraGve'hierarchical'clustering'with'average'linkage'(Source:'ESL)''6830'gene'expression'values'from'64'tumors'of'12'types'
522 14. Unsupervised Learning
CN
SC
NS
CN
SR
EN
AL
BR
EA
ST
CN
SCN
S
BR
EA
ST
NS
CLC
NS
CLC
RE
NA
LR
EN
AL
RE
NA
LRE
NA
LR
EN
AL
RE
NA
L
RE
NA
L
BR
EA
ST
NS
CLC
RE
NA
L
UN
KN
OW
NO
VA
RIA
N
ME
LAN
OM
A
PR
OS
TA
TE
OV
AR
IAN
OV
AR
IAN
OV
AR
IAN
OV
AR
IAN
OV
AR
IAN
PR
OS
TA
TE
NS
CLC
NS
CLC
NS
CLC
LEU
KE
MIA
K56
2B-r
epro
K56
2A-r
epro
LEU
KE
MIA
LEU
KE
MIA
LEU
KE
MIA
LEU
KE
MIA
LEU
KE
MIA
CO
LON
CO
LON
CO
LON
CO
LON
CO
LON
CO
LON
CO
LON
MC
F7A
-rep
roB
RE
AS
TM
CF
7D-r
epro
BR
EA
ST
NS
CLC
NS
CLC
NS
CLC
ME
LAN
OM
AB
RE
AS
TB
RE
AS
T
ME
LAN
OM
A
ME
LAN
OM
AM
ELA
NO
MA
ME
LAN
OM
A
ME
LAN
OM
A
ME
LAN
OM
A
FIGURE 14.12. Dendrogram from agglomerative hierarchical clustering withaverage linkage to the human tumor microarray data.
chical structure produced by the algorithm. Hierarchical methods imposehierarchical structure whether or not such structure actually exists in thedata.
The extent to which the hierarchical structure produced by a dendro-gram actually represents the data itself can be judged by the copheneticcorrelation coefficient. This is the correlation between the N(N−1)/2 pair-wise observation dissimilarities dii′ input to the algorithm and their corre-sponding cophenetic dissimilarities Cii′ derived from the dendrogram. Thecophenetic dissimilarity Cii′ between two observations (i, i′) is the inter-group dissimilarity at which observations i and i′ are first joined togetherin the same cluster.
The cophenetic dissimilarity is a very restrictive dissimilarity measure.First, the Cii′ over the observations must contain many ties, since only N−1of the total N(N − 1)/2 values can be distinct. Also these dissimilaritiesobey the ultrametric inequality
Cii′ ≤ max{Cik, Ci′k} (14.40)
-
Clustering'human'tumor'microarray'data'! ''
524 14. Unsupervised Learning
Average Linkage Complete Linkage Single Linkage
FIGURE 14.13. Dendrograms from agglomerative hierarchical clustering of hu-man tumor microarray data.
observations within them are relatively close together (small dissimilarities)as compared with observations in different clusters. To the extent this isnot the case, results will differ.
Single linkage (14.41) only requires that a single dissimilarity dii′ , i ∈ Gand i′ ∈ H, be small for two groups G and H to be considered closetogether, irrespective of the other observation dissimilarities between thegroups. It will therefore have a tendency to combine, at relatively lowthresholds, observations linked by a series of close intermediate observa-tions. This phenomenon, referred to as chaining, is often considered a de-fect of the method. The clusters produced by single linkage can violate the“compactness” property that all observations within each cluster tend tobe similar to one another, based on the supplied observation dissimilari-ties {dii′}. If we define the diameter DG of a group of observations as thelargest dissimilarity among its members
DG = maxi∈Gi′∈G
dii′ , (14.44)
then single linkage can produce clusters with very large diameters.Complete linkage (14.42) represents the opposite extreme. Two groups
G and H are considered close only if all of the observations in their unionare relatively similar. It will tend to produce compact clusters with smalldiameters (14.44). However, it can produce clusters that violate the “close-ness” property. That is, observations assigned to a cluster can be much
Source:'ESL'
-
Clustering'human'tumor'microarray'data'! Can'also'cluster'genes'(instead'of'tumors)'based'on'similar'expression'paPerns'across'tumors'
! Heatmap'columns'have'been'reordered'based'on'clustering'• Ordering'not'unique'• In'R'‘hclust’'subtrees'ordered'based'on'cluster'Gghtness'
! Daughter'cluster'with'smaller'internal'dissimilarity'ordered'first'
14.3Cluster
Analysis
527
FIG
URE
14.14.DNA
microarray
data:average
linkagehierarchical
clusteringhas
beenapplied
independentlyto
therow
s(genes)
andcolum
ns(sam
ples),de-
termining
theordering
ofthe
rowsand
columns
(seetext).
The
colorsrange
frombright
green(negative,
under-expressed)to
brightred
(positive,over-expressed).
Source:'ESL'
-
Choosing'k"Source:'Tibshirani'et'al.'(2001)'
! Microarray'data'! Avg.'linkage'! Gap'staGsGc'used'to'select'truncaGon'level'/'number'of'clusters'
! CauGonary'tale?'• Approximate'local'maximum'at'k'='2'
• Gap'rises'again''a^er'k'='6'
• Reflects'smaller'clusters'within'large'separated'clusters'
!"##"$%& %&' ())*+, -./012 3%,,"+' )45 % 3)6*,+7+&8"9+ 8"64#%5")& 3)6*%,"8)& ): ;<'"::+,+&5 *,)3+'4,+8= >6)&$ 57+ $#)?%# 6+57)'8 *+,:),6"&$ 57+ ?+85 @%8 57+ "&'+A '4+ 5)(%#"&8B" %&' C%,%?%8D -./EF2G
(C!!" # "!!"!!!$ ."#!!"!!$$ !" !H"
@7+,+ "!!" %&' #!!" %,+ 57+ ?+5@++&I %&' @"57"&I3#485+, 8468 ): 8J4%,+8K @"57 ! 3#485+,8=L7+ "'+% "8 5) 6%A"6"D+ (C!!" )9+, 57+ &46?+, ): 3#485+,8 != (C!." "8 &)5 '+M&+'N +9+& ": "5
!"# $%"%&'%&( !"#
!"#$ %$ $%&'()*(+, -(), ./% '%)01(23)&456%25 +52' 7$89: ,25()+((+1 '+.+; ./% ')..%' 62&% 54.< ./% .(%%= 6%+>2&*.?) 564
-
In'the'wild'“Repeated'ObservaGon'of'Breast'Tumor'Subtypes'in'Independent'Gene'Expression'Data'Sets”'(Sorlie'et'al.,'2003)''! Evidence'of'
mulGple'disease'subtypes'based'on'separate'clustering'results'on'several'datasets'
! IdenGfied'highly'expressed'genes'per'subtype'
! Generated'testable'hypotheses'
D. Blei Clustering 02 13 / 21
-
Hierarchical'clustering'in'the'wild'“The Statistical Analysis of Aesthetic Judgment: An Exploration” (Davenport and Studdert-Kennedy, 1972)
! Clustered 57 paintings rated for composition, drawing, color, & expression
! Results “at odds with conventional expectation”
! “Exploration suggests that there could be productive applications in the comparative analysis of subjective judgment”
! “The value of this analysis...will depend on any interesting speculation it may provoke.”
Good: They are cautious. “The value of this analysis...will depend onany interesting speculation it may provoke.”
D. Blei Clustering 02 15 / 21
-
PracGcaliGes'! Model'selecGon'(truncaGon'level)'is'sGll'necessary'to'achieve'a'single'clustering'• No'single'saGsfying'soluGon,'but'many'of'the'methods'discussed'in'kdmeans'seeng'also'apply'here'
! InterpretaGon'of'dendrograms'difficult'for'large'datasets'• One'soluGon:'label'each'interior'node'with'a'prototype'datapoint'! Choose'point'with'minimal'maximum'dissimilarity'to'any'other'point'in'cluster'(Bien'&'Tibshirani,'2011:'Hierarchical'Clustering'with'Prototypes'via'Minimax'Linkage)'
! Use'minimal'maximum'dissimilarity'as'cluster'dissim.'measure:'minimax&linkage&
! Yields'interpretable'cluster'summary'at'every'level'
-
Extensions'! Could'use'alternaGve'measures'of'cluster'dissimilarity,'even'those'that'do'not'arise'from'pairwise'observaGon'dissimilarity'
! We'have'discussed'model-free&approaches'to'hierarchical'clustering'(akin'to'kdmeans),'but'probabilis3c,&model-based&approaches'(closer'in'spirit'to'mixture'modeling)'also'exist'
-
Spectral'clustering'! MoGvaGon'
• Methods'like'kdmeans'welldsuited'for'spherical'or'ellipGcal'clusters'but'o^en'fail'to'capture'nondconvex'clusters''! Example:'points'in'concentric'circles'
• &Spectral&clustering&is'designed'for'such'situaGons,'where'clusters'are'connected'but'perhaps'not'compact'
Spectral Clustering
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
K-means, K=2
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
Spectral clusteringSriram Sankararaman Clustering
Spectral Clustering
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
K-means, K=2
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
Spectral clusteringSriram Sankararaman Clustering
k-means,&2&clusters& Spectral&clustering,&2&clusters&
-
Blackboard'discussion'! See'lecture'notes'