lecture'6:' hierarchical'clustering;'...

38
Lecture 6: Hierarchical Clustering; Spectral Clustering Stats 306B: Unsupervised Learning Lester Mackey April 16, 2014

Upload: others

Post on 16-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Lecture'6:'Hierarchical'Clustering;'Spectral'Clustering'

    Stats'306B:'Unsupervised'Learning'

    Lester'Mackey'

    April'16,'2014''

  • Blackboard'discussion'!  See'lecture'notes'

  • Average'linkage'agglomeraGve'clustering'!  Example'behavior'in'2D,'Courtesy:'Dave'Blei'

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80Data

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 001

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 002

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 003

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 004

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 005

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 006

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 007

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 008

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 009

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 010

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 011

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 012

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 013

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 014

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 015

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 016

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 017

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 018

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 019

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 020

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 021

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 022

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 023

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Example

    ● ●

    0 20 40 60 80

    −20

    020

    4060

    80iteration 024

    V1

    V2

    D. Blei Clustering 02 5 / 21

  • Clustering'human'tumor'microarray'data'

    Dendrogram'from'agglomeraGve'hierarchical'clustering'with'average'linkage'(Source:'ESL)''6830'gene'expression'values'from'64'tumors'of'12'types'

    522 14. Unsupervised Learning

    CN

    SC

    NS

    CN

    SR

    EN

    AL

    BR

    EA

    ST

    CN

    SCN

    S

    BR

    EA

    ST

    NS

    CLC

    NS

    CLC

    RE

    NA

    LR

    EN

    AL

    RE

    NA

    LRE

    NA

    LR

    EN

    AL

    RE

    NA

    L

    RE

    NA

    L

    BR

    EA

    ST

    NS

    CLC

    RE

    NA

    L

    UN

    KN

    OW

    NO

    VA

    RIA

    N

    ME

    LAN

    OM

    A

    PR

    OS

    TA

    TE

    OV

    AR

    IAN

    OV

    AR

    IAN

    OV

    AR

    IAN

    OV

    AR

    IAN

    OV

    AR

    IAN

    PR

    OS

    TA

    TE

    NS

    CLC

    NS

    CLC

    NS

    CLC

    LEU

    KE

    MIA

    K56

    2B-r

    epro

    K56

    2A-r

    epro

    LEU

    KE

    MIA

    LEU

    KE

    MIA

    LEU

    KE

    MIA

    LEU

    KE

    MIA

    LEU

    KE

    MIA

    CO

    LON

    CO

    LON

    CO

    LON

    CO

    LON

    CO

    LON

    CO

    LON

    CO

    LON

    MC

    F7A

    -rep

    roB

    RE

    AS

    TM

    CF

    7D-r

    epro

    BR

    EA

    ST

    NS

    CLC

    NS

    CLC

    NS

    CLC

    ME

    LAN

    OM

    AB

    RE

    AS

    TB

    RE

    AS

    T

    ME

    LAN

    OM

    A

    ME

    LAN

    OM

    AM

    ELA

    NO

    MA

    ME

    LAN

    OM

    A

    ME

    LAN

    OM

    A

    ME

    LAN

    OM

    A

    FIGURE 14.12. Dendrogram from agglomerative hierarchical clustering withaverage linkage to the human tumor microarray data.

    chical structure produced by the algorithm. Hierarchical methods imposehierarchical structure whether or not such structure actually exists in thedata.

    The extent to which the hierarchical structure produced by a dendro-gram actually represents the data itself can be judged by the copheneticcorrelation coefficient. This is the correlation between the N(N−1)/2 pair-wise observation dissimilarities dii′ input to the algorithm and their corre-sponding cophenetic dissimilarities Cii′ derived from the dendrogram. Thecophenetic dissimilarity Cii′ between two observations (i, i′) is the inter-group dissimilarity at which observations i and i′ are first joined togetherin the same cluster.

    The cophenetic dissimilarity is a very restrictive dissimilarity measure.First, the Cii′ over the observations must contain many ties, since only N−1of the total N(N − 1)/2 values can be distinct. Also these dissimilaritiesobey the ultrametric inequality

    Cii′ ≤ max{Cik, Ci′k} (14.40)

  • Clustering'human'tumor'microarray'data'!  ''

    524 14. Unsupervised Learning

    Average Linkage Complete Linkage Single Linkage

    FIGURE 14.13. Dendrograms from agglomerative hierarchical clustering of hu-man tumor microarray data.

    observations within them are relatively close together (small dissimilarities)as compared with observations in different clusters. To the extent this isnot the case, results will differ.

    Single linkage (14.41) only requires that a single dissimilarity dii′ , i ∈ Gand i′ ∈ H, be small for two groups G and H to be considered closetogether, irrespective of the other observation dissimilarities between thegroups. It will therefore have a tendency to combine, at relatively lowthresholds, observations linked by a series of close intermediate observa-tions. This phenomenon, referred to as chaining, is often considered a de-fect of the method. The clusters produced by single linkage can violate the“compactness” property that all observations within each cluster tend tobe similar to one another, based on the supplied observation dissimilari-ties {dii′}. If we define the diameter DG of a group of observations as thelargest dissimilarity among its members

    DG = maxi∈Gi′∈G

    dii′ , (14.44)

    then single linkage can produce clusters with very large diameters.Complete linkage (14.42) represents the opposite extreme. Two groups

    G and H are considered close only if all of the observations in their unionare relatively similar. It will tend to produce compact clusters with smalldiameters (14.44). However, it can produce clusters that violate the “close-ness” property. That is, observations assigned to a cluster can be much

    Source:'ESL'

  • Clustering'human'tumor'microarray'data'!  Can'also'cluster'genes'(instead'of'tumors)'based'on'similar'expression'paPerns'across'tumors'

    !  Heatmap'columns'have'been'reordered'based'on'clustering'•  Ordering'not'unique'•  In'R'‘hclust’'subtrees'ordered'based'on'cluster'Gghtness'

    !  Daughter'cluster'with'smaller'internal'dissimilarity'ordered'first'

    14.3Cluster

    Analysis

    527

    FIG

    URE

    14.14.DNA

    microarray

    data:average

    linkagehierarchical

    clusteringhas

    beenapplied

    independentlyto

    therow

    s(genes)

    andcolum

    ns(sam

    ples),de-

    termining

    theordering

    ofthe

    rowsand

    columns

    (seetext).

    The

    colorsrange

    frombright

    green(negative,

    under-expressed)to

    brightred

    (positive,over-expressed).

    Source:'ESL'

  • Choosing'k"Source:'Tibshirani'et'al.'(2001)'

    !  Microarray'data'!  Avg.'linkage'!  Gap'staGsGc'used'to'select'truncaGon'level'/'number'of'clusters'

    !  CauGonary'tale?'•  Approximate'local'maximum'at'k'='2'

    •  Gap'rises'again''a^er'k'='6'

    •  Reflects'smaller'clusters'within'large'separated'clusters'

    !"##"$%& %&' ())*+, -./012 3%,,"+' )45 % 3)6*,+7+&8"9+ 8"64#%5")& 3)6*%,"8)& ): ;<'"::+,+&5 *,)3+'4,+8= >6)&$ 57+ $#)?%# 6+57)'8 *+,:),6"&$ 57+ ?+85 @%8 57+ "&'+A '4+ 5)(%#"&8B" %&' C%,%?%8D -./EF2G

    (C!!" # "!!"!!!$ ."#!!"!!$$ !" !H"

    @7+,+ "!!" %&' #!!" %,+ 57+ ?+5@++&I %&' @"57"&I3#485+, 8468 ): 8J4%,+8K @"57 ! 3#485+,8=L7+ "'+% "8 5) 6%A"6"D+ (C!!" )9+, 57+ &46?+, ): 3#485+,8 != (C!." "8 &)5 '+M&+'N +9+& ": "5

    !"# $%"%&'%&( !"#

    !"#$ %$ $%&'()*(+, -(), ./% '%)01(23)&456%25 +52' 7$89: ,25()+((+1 '+.+; ./% ')..%' 62&% 54.< ./% .(%%= 6%+>2&*.?) 564

  • In'the'wild'“Repeated'ObservaGon'of'Breast'Tumor'Subtypes'in'Independent'Gene'Expression'Data'Sets”'(Sorlie'et'al.,'2003)''!  Evidence'of'

    mulGple'disease'subtypes'based'on'separate'clustering'results'on'several'datasets'

    !  IdenGfied'highly'expressed'genes'per'subtype'

    !  Generated'testable'hypotheses'

    D. Blei Clustering 02 13 / 21

  • Hierarchical'clustering'in'the'wild'“The Statistical Analysis of Aesthetic Judgment: An Exploration” (Davenport and Studdert-Kennedy, 1972)

    !  Clustered 57 paintings rated for composition, drawing, color, & expression

    !  Results “at odds with conventional expectation”

    !  “Exploration suggests that there could be productive applications in the comparative analysis of subjective judgment”

    !  “The value of this analysis...will depend on any interesting speculation it may provoke.”

    Good: They are cautious. “The value of this analysis...will depend onany interesting speculation it may provoke.”

    D. Blei Clustering 02 15 / 21

  • PracGcaliGes'!  Model'selecGon'(truncaGon'level)'is'sGll'necessary'to'achieve'a'single'clustering'•  No'single'saGsfying'soluGon,'but'many'of'the'methods'discussed'in'kdmeans'seeng'also'apply'here'

    !  InterpretaGon'of'dendrograms'difficult'for'large'datasets'•  One'soluGon:'label'each'interior'node'with'a'prototype'datapoint'!  Choose'point'with'minimal'maximum'dissimilarity'to'any'other'point'in'cluster'(Bien'&'Tibshirani,'2011:'Hierarchical'Clustering'with'Prototypes'via'Minimax'Linkage)'

    ! Use'minimal'maximum'dissimilarity'as'cluster'dissim.'measure:'minimax&linkage&

    !  Yields'interpretable'cluster'summary'at'every'level'

  • Extensions'!  Could'use'alternaGve'measures'of'cluster'dissimilarity,'even'those'that'do'not'arise'from'pairwise'observaGon'dissimilarity'

    !  We'have'discussed'model-free&approaches'to'hierarchical'clustering'(akin'to'kdmeans),'but'probabilis3c,&model-based&approaches'(closer'in'spirit'to'mixture'modeling)'also'exist'

  • Spectral'clustering'!  MoGvaGon'

    •  Methods'like'kdmeans'welldsuited'for'spherical'or'ellipGcal'clusters'but'o^en'fail'to'capture'nondconvex'clusters''!  Example:'points'in'concentric'circles'

    •  &Spectral&clustering&is'designed'for'such'situaGons,'where'clusters'are'connected'but'perhaps'not'compact'

    Spectral Clustering

    −4 −3 −2 −1 0 1 2 3 4−4

    −3

    −2

    −1

    0

    1

    2

    3

    4

    K-means, K=2

    −4 −3 −2 −1 0 1 2 3 4−4

    −3

    −2

    −1

    0

    1

    2

    3

    4

    Spectral clusteringSriram Sankararaman Clustering

    Spectral Clustering

    −4 −3 −2 −1 0 1 2 3 4−4

    −3

    −2

    −1

    0

    1

    2

    3

    4

    K-means, K=2

    −4 −3 −2 −1 0 1 2 3 4−4

    −3

    −2

    −1

    0

    1

    2

    3

    4

    Spectral clusteringSriram Sankararaman Clustering

    k-means,&2&clusters& Spectral&clustering,&2&clusters&

  • Blackboard'discussion'!  See'lecture'notes'