stairstep-like dendrogram cut: a permutation test approach€¦ · stairstep-like dendrogram cut: a...

Post on 21-May-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Stairstep-like dendrogram cut:a permutation test approach

Dario Bruzzese Domenico Vistoccodbruzzes@unina.it vistocco@unicas.it

——————————————————————————————–Department of Department of

Preventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINO

ITALY ITALY

All computations andgraphics were done using theR system (packages: cluster,clusterGeneration, ggplot2)

—————————————Slides has been composed

using LATEX(beamer class) andthe Sweave tool

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 1 / 22

Stairstep-like dendrogram cut:a permutation test approach

Dario Bruzzese Domenico Vistoccodbruzzes@unina.it vistocco@unicas.it

——————————————————————————————–Department of Department of

Preventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINO

ITALY ITALY

All computations andgraphics were done using theR system (packages: cluster,clusterGeneration, ggplot2)

—————————————Slides has been composed

using LATEX(beamer class) andthe Sweave tool

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 1 / 22

(a not necessarily regular cut for a dendrogram)

Motivation

The rep1HighNoisedatasetYeung KY, Medvedovic M,Bumgarner KY:Clustering gene-expression datawith repeated measurements.

Genome Biology, 2003, 4:R34

n = 200p = 20It is a synthetic data set with

error distributions derived from

real array data.

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .

k = 7

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .

k = 7 (brown clusters)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .

k = 7 (brown clusters)

An alternative cutk = 3 (rainbow clusters)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .

k = 7 (brown clusters)

An alternative cutk = 3 (rainbow clusters)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .

k = 7 (brown clusters)

An alternative cutk = 4 (rainbow clusters)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .

k = 7 (brown clusters)

An alternative cutk = 4 (rainbow clusters)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .

k = 7 (brown clusters)

α = 0.015 clusters

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .

k = 7 (brown clusters)

α = 0.015 clusters

An alternative cutk = 5 (rainbow clusters)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

Motivation

Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .

k = 7 (brown clusters)

α = 0.015 clusters

An alternative cutk = 5 (rainbow clusters)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22

The reference framework

Tools

Statistics

R

The reference framework

Tools

Statistics

Hierarchi-cal

clustering

Permuta-tiontests

R

The reference framework

Tools

Statistics

Hierarchi-cal

clustering

Permuta-tiontests

R

hclustplot.hclust

{stats}

genRandom-Clust

{cluster-Generation}qplot

ggplot{ggplot2}

La Carte

1 A (? simple ?) idea

2 A (? not so ?) simple procedure

3 Some results

4 The Wishlist

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 4 / 22

La Carte

1 A (? simple ?) idea

2 A (? not so ?) simple procedure

3 Some results

4 The Wishlist

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 5 / 22

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

Let:

n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

︸ ︷︷ ︸

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

C1RC1

L

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

C2RC2

L

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

C3RC3

L

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

h“

C1L ∪ C1

R

C1RC1

L

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

h“

C2L ∪ C2

R

C2RC2

L

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

h“

C3L ∪ C3

R

C3RC3

L

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

h“

C1L

C1L

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

h“

C1R

C1R

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

h“

C2L

C2L

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

h“

C2R

C2R

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

h“

C3L

C3L

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple idea - notation

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22

h“

C3R

C3R

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? simple ?) ideaInput: A dataset and its related dendrogramOutput: A partition of the dataset

initialization:aggregationLevelsToVisit← h(C1

L ∪ C1R)

permClusters← [ ]i← 1repeat

if C iL ≡ C i

R thenadd C i

L ∪ C iR to permClusters

elseadd h(C i

L) and h(C iR) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1

until aggregationLevelsToVisit is empty

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 7 / 22

The (? simple ?) ideaInput: A dataset and its related dendrogramOutput: A partition of the dataset

initialization:aggregationLevelsToVisit← h(C1

L ∪ C1R)

permClusters← [ ]i← 1

repeatif C i

L ≡ C iR then

add C iL ∪ C i

R to permClusterselse

add h(C iL) and h(C i

R) to aggregationLevelsToVisitsort aggregationLevelsToVisit in descending order

endremove the first element from aggregationLevelsToVisiti← i+1

until aggregationLevelsToVisit is empty

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 7 / 22

The (? simple ?) ideaInput: A dataset and its related dendrogramOutput: A partition of the dataset

initialization:aggregationLevelsToVisit← h(C1

L ∪ C1R)

permClusters← [ ]i← 1repeat

if C iL ≡ C i

R thenadd C i

L ∪ C iR to permClusters

elseadd h(C i

L) and h(C iR) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending orderend

remove the first element from aggregationLevelsToVisiti← i+1

until aggregationLevelsToVisit is empty

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 7 / 22

The (? simple ?) ideaInput: A dataset and its related dendrogramOutput: A partition of the dataset

initialization:aggregationLevelsToVisit← h(C1

L ∪ C1R)

permClusters← [ ]i← 1repeat

if C iL ≡ C i

R thenadd C i

L ∪ C iR to permClusters

elseadd h(C i

L) and h(C iR) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1

until aggregationLevelsToVisit is empty

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 7 / 22

The (? simple ?) ideaInput: A dataset and its related dendrogramOutput: A partition of the dataset

initialization:aggregationLevelsToVisit← h(C1

L ∪ C1R)

permClusters← [ ]i← 1repeat

if C iL ≡ C i

R thenadd C i

L ∪ C iR to permClusters

elseadd h(C i

L) and h(C iR) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1

until aggregationLevelsToVisit is empty

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 7 / 22

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

Iterationi ← 1

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

h“

C1L ∪ C1

R

C1RC1

L

permClusters

aggregationLevelsToVisit

h(C1L ∪ C1

R)

Iterationi ← 1

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

C1RC1

L

clusters to compare

H0 : C1L ≡ C1

R 7→ reject

permClusters

aggregationLevelsToVisit

h(C1L ∪ C1

R)

Iterationi ← 1

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

aggregationLevelsToVisit

h(C1R),h(C1

L)

Iterationi ← 2

C1RC1

L

permClusters

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

h“

C1R

C1R

aggregationLevelsToVisit

h(C1R),h(C1

L)

Iterationi ← 2

permClusters

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

C2RC2

L

clusters to compare

H0 : C2L ≡ C2

R 7→ reject

aggregationLevelsToVisit

h(C1R),h(C1

L)

Iterationi ← 2

permClusters

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

C2RC2

L

aggregationLevelsToVisit

h(C1L),h(C2

R),h(C2L)

Iterationi ← 3

permClusters

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

h“

C1L

C1L

aggregationLevelsToVisit

h(C1L),h(C2

R),h(C2L)

Iterationi ← 3

permClusters

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

C3RC3

L

clusters to compare

H0 : C3L ≡ C3

R 7→ reject

aggregationLevelsToVisit

h(C1L),h(C2

R),h(C2L)

Iterationi ← 3

permClusters

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

aggregationLevelsToVisit

h(C3R),h(C2

R),h(C2L),h(C3

L)

Iterationi ← 4

C3RC3

LpermClusters

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

h“

C3R

C3R

aggregationLevelsToVisit

h(C3R),h(C2

R),h(C2L),h(C3

L)

Iterationi ← 4

permClusters

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

permClusters

C4L ∪ C4

R

clusters to compare

H0 : C4L ≡ C4

R 7→ accept

C4RC4

L

aggregationLevelsToVisit

h(C3R),h(C2

R),h(C2L),h(C3

L)

Iterationi ← 4

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

C3R

permClusters

C4L ∪ C4

R ⇔ C3R

clusters to compare

H0 : C4L ≡ C4

R 7→ accept

aggregationLevelsToVisit

h(C3R),h(C2

R),h(C2L),h(C3

L)

Iterationi ← 4

The (? not so ?) simple idea in action

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22

permClusters

C3L ,C

3R,C

2L ,C

4L ,C

4R

aggregationLevelsToVisit

Iterationi ← 9

aggregationLevelsToVisit

h(C3R),h(C2

R),h(C2L),h(C3

L)

La Carte

1 A (? simple ?) idea

2 A (? not so ?) simple procedure

3 Some results

4 The Wishlist

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 9 / 22

The (? not so ?) simple procedure

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 10 / 22

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple procedure

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 10 / 22

max h(C3j )

min h(C3j )

For each k , the difference between maxj∈{L,R}

h“

Ckj

”and min

j∈{L,R}h

“Ck

j

”can be considered

as the minimum cost necessary to merge the two classes..

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple procedure

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 10 / 22

max h(C3j )

h(C3L ∪ C3

R )

For each k , the difference between maxj∈{L,R}

h“

Ckj

”and min

j∈{L,R}h

“Ck

j

”can be considered

as the minimum cost necessary to merge the two classes.

The difference between h“

CkL ∪ Ck

R

”and max

j∈{L,R}h

“Ck

j

”can be, instead, considered as

the cost actually incurred for merging CkL and Ck

R .

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple procedure

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 10 / 22

The ratio between these two costs:

maxj∈{L,R}

h“

Ckj

”− min

j∈{L,R}h

“Ck

j

”h

`Ck

L ∪ CkR

´− max

j∈{L,R}h

“Ck

j

”is thus a measure that characterizes the aggregation process resulting in thenew class Ck

L ∪ CkR

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h“

CkL ∪ Ck

R

”the height necessary to merge

CkL and Ck

R

h“

Ckj

”the height at which Ck

j has been obtained(j ∈ { L, R })

The (? not so ?) simple procedure: detail

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22

The algorithm retraces down-ward the tree, startingfrom the root of the dendrogram where all objectsare classified in a unique cluster.

The (? not so ?) simple procedure: detail

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22

C1L C1

R

The algorithm retraces down-ward the tree, startingfrom the root of the dendrogram where all objectsare classified in a unique cluster.∀ k a permutation test is designed to test the NullHypothesis that the two classes Ck

L and CkR really

belong to the same cluster, i.e. :

H0 : CkL ≡ Ck

R

The (? not so ?) simple procedure: detail

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22

C1L C1

R

The algorithm retraces down-ward the tree, startingfrom the root of the dendrogram where all objectsare classified in a unique cluster.∀ k a permutation test is designed to test the NullHypothesis that the two classes Ck

L and CkR really

belong to the same cluster, i.e. :

H0 : CkL ≡ Ck

R

Under H0, mixing up (permuting) the statistical unitsof Ck

L and CkR should not alter the aggregation pro-

cess resulting in their merging in.

The (? not so ?) simple procedure: detail

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22

mC1LmC1

R

C1L C1

R

Let mCkL and mCk

R be the two new classes obtained by permuting the elements in CkL and Ck

R

The algorithm retraces down-ward the tree, startingfrom the root of the dendrogram where all objectsare classified in a unique cluster.∀ k a permutation test is designed to test the NullHypothesis that the two classes Ck

L and CkR really

belong to the same cluster, i.e. :

H0 : CkL ≡ Ck

R

Under H0, mixing up (permuting) the statistical unitsof Ck

L and CkR should not alter the aggregation pro-

cess resulting in their merging in.

The (? not so ?) simple procedure: detail

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22

mC1R

mC1L

mC1LmC1

R

C1L C1

R

Let mCkL and mCk

R be the two new classes obtained by permuting the elements in CkL and Ck

R

For each of them a new dendrogram is generated.

The (? not so ?) simple procedure: detail

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22

mC1R

mC1L

mC1LmC1

R

C1L C1

R

h(mC1R)

h(mC1L)

Let mCkL and mCk

R be the two new classes obtained by permuting the elements in CkL and Ck

R

For each of them a new dendrogram is generated.

The heights at which each of the two classes are buit up again, clearly correspondto the heights of the root nodes of the corresponding dendrograms.

The (? not so ?) simple procedure: detail

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22

mC1R

mC1L

mC1LmC1

R

C1L C1

R

h(mC1R)

h(mC1L)

The ratio:

cost“

mCkL ∪ mCk

R

”=

maxj∈{L,R}

h“

mCkj

”− min

j∈{L,R}h

“mCk

j

”h

`Ck

L ∪ CkR

´− max

j∈{L,R}h

“mCk

j

”is thus a measure that characterizes the aggregation process resulting in thenew (potential) class mCk

L ∪ mCkR

The (? not so ?) simple procedure: detail

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22

mC1R

mC1L

mC1LmC1

R

C1L C1

R

Under H0 the aggregation process resulting in the new cluster CkL ∪ Ck

R should be very similar

to the one that potentially produces mCkL ∪ mCk

R ; thus the two values cost“

mCkL ∪ mCk

R

”and

cost“

CkL ∪ Ck

R

”should be close enough.

The (? not so ?) simple procedure: detail

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22

mC1R

mC1L

mC1LmC1

R

C1L C1

R

The permutation procedure is repeated M times and each time a new couple mCkL , mCk

R is ob-tained. The pvalue Montecarlo is thus computed as:

p =#

˘cost

`mCk

L ∪ mCkR

´≤ cost

`Ck

L ∪ CkR

´¯+ 1

M + 1

La Carte

1 A (? simple ?) idea

2 A (? not so ?) simple procedure

3 Some results

4 The Wishlist

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 12 / 22

Some results

The yeast galactosedatasetIdeker T, Thorsson V, Ranish JA,Christmas R, Buhler J, Eng JK,Bumgarner RE, Goodlett DR,Aebersold R, Hood LIntegrated genomic andproteomic analyses of asystemically perturbed metabolicnetwork.

Science 2001, 292:929-934.

n = 205p = 80It is a subset of 205 genes that

reflect four functional categories

in the Gene Ontology listings.

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 13 / 22

Some results

SettingsdistanceMethod = euclideanaggregationMethod = Wardα = 0.05M = 999

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 13 / 22

Some results

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 13 / 22

Some results

The diabetes datasetBanfield JD, Raftery AEModel–based Gaussian andNon–Gaussian Clustering.

Biometrics, 1993, 49, 803-821.

n = 145p = 3It contains 145 subjects divided

into three groups (normal,

chemical diabetes, overt

diabetes) on the basis of their

oral glucose tolerance

descripted by three variables

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 14 / 22

Some results

SettingsdistanceMethod = euclideanaggregationMethod = Wardα = 0.05M = 999

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 14 / 22

Some results

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 14 / 22

Some results... for 5 variables

genRandomClusternumClust = 2:7numNonNoisy = 5sepVal = 0.01

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 15 / 22

Some results... for 5 variables

genRandomClusternumClust = 2:7numNonNoisy = 5sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = Ward

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 15 / 22

Some results... for 5 variables

genRandomClusternumClust = 2:7numNonNoisy = 5sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.1

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 15 / 22

Some results... for 5 variables

genRandomClusternumClust = 2:7numNonNoisy = 5sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.05

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 15 / 22

Some results... for 5 variables

genRandomClusternumClust = 2:7numNonNoisy = 5sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.01

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 15 / 22

Some results... for 5 variables (100 replications)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 16 / 22

Some results... for 10 variables

genRandomClusternumClust = 2:7numNonNoisy = 10sepVal = 0.01

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 17 / 22

Some results... for 10 variables

genRandomClusternumClust = 2:7numNonNoisy = 10sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = Ward

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 17 / 22

Some results... for 10 variables

genRandomClusternumClust = 2:7numNonNoisy = 10sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.1

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 17 / 22

Some results... for 10 variables

genRandomClusternumClust = 2:7numNonNoisy = 10sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.05

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 17 / 22

Some results... for 10 variables

genRandomClusternumClust = 2:7numNonNoisy = 10sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.01

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 17 / 22

Some results... for 10 variables (100 replications)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 18 / 22

Some results... for 15 variables

genRandomClusternumClust = 2:7numNonNoisy = 15sepVal = 0.01

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 19 / 22

Some results... for 15 variables

genRandomClusternumClust = 2:7numNonNoisy = 15sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = Ward

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 19 / 22

Some results... for 15 variables

genRandomClusternumClust = 2:7numNonNoisy = 15sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.1

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 19 / 22

Some results... for 15 variables

genRandomClusternumClust = 2:7numNonNoisy = 15sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.05

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 19 / 22

Some results... for 15 variables

genRandomClusternumClust = 2:7numNonNoisy = 15sepVal = 0.01

SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.01

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 19 / 22

Some results... for 15 variables (100 replications)

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 20 / 22

La Carte

1 A (? simple ?) idea

2 A (? not so ?) simple procedure

3 Some results

4 The Wishlist

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 21 / 22

The wishlist

Statistical issues

Quality measures of the obtained partitionUse of different types of clusters

I different cardinality of clustersI different type of cluster generation

Study on the stability of the number of Montecarlo replicationsComputational complexity

R issues

profiling and optimizing the R codeuse of compiled codeuse of S3–S4 methodsdeploying a package

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 22 / 22

The wishlist

Statistical issues

Quality measures of the obtained partitionUse of different types of clusters

I different cardinality of clustersI different type of cluster generation

Study on the stability of the number of Montecarlo replicationsComputational complexity

R issues

profiling and optimizing the R codeuse of compiled codeuse of S3–S4 methodsdeploying a package

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 22 / 22

The wishlist

Statistical issues

Quality measures of the obtained partitionUse of different types of clusters

I different cardinality of clustersI different type of cluster generation

Study on the stability of the number of Montecarlo replicationsComputational complexity

R issues

profiling and optimizing the R codeuse of compiled codeuse of S3–S4 methodsdeploying a package

D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 22 / 22

top related