visualizing cluster results using package flexclust and ... · visualizing cluster results using...

49
Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes, 10.7.2009

Upload: others

Post on 25-Mar-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Visualizing Cluster Results

Using Package FlexClust and Friendsd

Friedrich Leisch

University of Munich

useR!, Rennes, 10.7.2009

Page 2: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Acknowledgements & Apology

• Sara Dolnicar (University of Wollongong)

• Theresa Scharl, Ingo Voglhuber (Vienna University of Technology)

• Paul Murrell, Deepayan Sarkar (R Core)

Friedrich Leisch: Cluster Visualization, 10.7.2009 1

Page 3: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Acknowledgements & Apology

• Sara Dolnicar (University of Wollongong)

• Theresa Scharl, Ingo Voglhuber (Vienna University of Technology)

• Paul Murrell, Deepayan Sarkar (R Core)

Apology: Microarray data only in Theresa’s talk (try time-shift back to

Wednesday).

Friedrich Leisch: Cluster Visualization, 10.7.2009 1

Page 4: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Partitioning Clustering – KCCA

K-centroid cluster algorithms:

• data set XN = {x1, . . . ,xN}, set of centroids CK = {c1, . . . , cK}• distance measure d(x,y)

• centroid c closest to x:

c(x) = argminc∈CK

d(x, c)

• Most KCCA algorithms try to find a set of centroids CK for fixed K

such that the average distance

D(XN , CK) =1

N

N∑n=1

d(xn, c(xn))→ minCK

,

of each point to the closest centroid is minimal.

• Optimization algorithm not important for rest of talk

Friedrich Leisch: Cluster Visualization, 10.7.2009 2

Page 5: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Example: Australian Volunteers

Survey among 1415 Australian adults about which organziations they

would consider to volunteer for, main motivation to volunteer, actual

volunteering, image of organizations, . . .

We use a block of 19 binary questions (“applies”, “does not apply”)

about motivations to volunteer: “I want to meet people”, “I have no

one else”, “I want to set an example”, . . .

Organziations investigated: Red Cross, Surf Life Savers, Rural Fire

Service, Parents Association, . . .

Our analyses show that there is both competition between organizations

with similar profiles as well as complimentary effects (idividuals volun-

teering for more than one arganization, in most cases with very different

profiles).

Friedrich Leisch: Cluster Visualization, 10.7.2009 3

Page 6: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

8 Volunteer Clusters

Cl.1 Cl.2 Cl.3 Cl.4 Cl.5 Cl.6 Cl.7 Cl.8 Totalmeet.people 9.24 15.77 28.77 97.18 82.17 80.97 90.81 34.23 49.47no.one.else 8.21 8.34 6.16 27.72 10.51 12.47 16.75 7.23 11.38example 12.80 14.68 63.56 93.30 35.84 73.33 80.32 45.30 47.63socialise 14.12 10.68 3.28 88.74 52.79 54.17 83.39 6.35 35.83help.others 0.00 100.00 92.93 95.17 56.95 89.18 86.98 86.12 66.78give.back 21.68 29.18 87.73 96.24 43.41 87.60 96.18 89.77 63.75career 12.04 5.54 10.46 71.34 35.01 17.49 18.29 11.54 20.57lonely 4.55 8.71 2.30 56.55 17.16 9.76 18.93 0.75 13.14active 17.17 15.93 23.38 93.82 81.61 53.97 77.00 23.98 44.88community 16.17 9.64 66.66 93.75 14.67 87.80 90.34 77.21 52.72cause 20.49 12.07 79.66 96.91 47.11 83.31 85.12 79.82 58.66faith 10.12 7.24 22.84 66.89 10.52 42.10 27.83 19.47 24.03services 7.00 7.10 11.63 78.15 23.99 43.72 44.98 14.64 25.23children 6.88 11.76 11.88 28.58 14.86 16.20 14.64 8.31 13.00good.job 19.14 23.81 100.00 94.61 49.18 85.35 75.38 0.00 51.80benefited 10.49 15.09 14.26 74.37 15.04 100.00 0.00 12.68 26.29network 10.75 8.47 6.29 85.86 43.59 22.83 38.98 10.28 25.94recognition 10.30 8.03 11.29 79.59 12.80 19.56 21.40 3.49 18.73mind.off 8.56 10.95 12.53 87.96 39.55 24.55 47.43 4.31 26.57

Friedrich Leisch: Cluster Visualization, 10.7.2009 4

Page 7: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

8 Volunteer Clusters

Cl.1 Cl.2 Cl.3 Cl.4 Cl.5 Cl.6 Cl.7 Cl.8 Totalmeet.people 9 16 29 97 82 81 91 34 49no.one.else 8 8 6 28 11 12 17 7 11example 13 15 64 93 36 73 80 45 48socialise 14 11 3 89 53 54 83 6 36help.others 0 100 93 95 57 89 87 86 67give.back 22 29 88 96 43 88 96 90 64career 12 6 10 71 35 17 18 12 21lonely 5 9 2 57 17 10 19 1 13active 17 16 23 94 82 54 77 24 45community 16 10 67 94 15 88 90 77 53cause 20 12 80 97 47 83 85 80 59faith 10 7 23 67 11 42 28 19 24services 7 7 12 78 24 44 45 15 25children 7 12 12 29 15 16 15 8 13good.job 19 24 100 95 49 85 75 0 52benefited 10 15 14 74 15 100 0 13 26network 11 8 6 86 44 23 39 10 26recognition 10 8 11 80 13 20 21 3 19mind.off 9 11 13 88 40 25 47 4 27

Friedrich Leisch: Cluster Visualization, 10.7.2009 5

Page 8: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

8 Volunteer Clusters

mind.offrecognition

networkbenefitedgood.jobchildrenservices

faithcause

communityactivelonelycareer

give.backhelp.others

socialiseexample

no.one.elsemeet.people

Cluster 1: 322 (23%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 2: 140 (10%) Cluster 3: 166 (12%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 4: 136 (10%)

mind.offrecognition

networkbenefitedgood.jobchildrenservices

faithcause

communityactivelonelycareer

give.backhelp.others

socialiseexample

no.one.elsemeet.people

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 5: 160 (11%) Cluster 6: 147 (10%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 7: 178 (13%) Cluster 8: 166 (12%)

Friedrich Leisch: Cluster Visualization, 10.7.2009 6

Page 9: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

8 Volunteer Clusters

We cannot easily test for differences between clusters, because they were

constructed to be different.

Friedrich Leisch: Cluster Visualization, 10.7.2009 7

Page 10: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

8 Volunteer Clusters

We cannot easily test for differences between clusters, because they were

constructed to be different.

Improved presentation of results (following advice that is only around for

a few decades):

• Add reference lines/points

• Sort variables by content:

1. sort clusters by mean

2. sort variables by hierarchical clustering

• Highlight important points

Friedrich Leisch: Cluster Visualization, 10.7.2009 7

Page 11: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Add reference points

mind.offrecognition

networkbenefitedgood.jobchildrenservices

faithcause

communityactivelonelycareer

give.backhelp.others

socialiseexample

no.one.elsemeet.people

Cluster 1: 322 (23%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 2: 140 (10%)

Cluster 3: 166 (12%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 4: 136 (10%)

mind.offrecognition

networkbenefitedgood.jobchildrenservices

faithcause

communityactivelonelycareer

give.backhelp.others

socialiseexample

no.one.elsemeet.people

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 5: 160 (11%)

Cluster 6: 147 (10%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 7: 178 (13%)

Cluster 8: 166 (12%)

Friedrich Leisch: Cluster Visualization, 10.7.2009 8

Page 12: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Sort Clusters

mind.offrecognition

networkbenefitedgood.jobchildrenservices

faithcause

communityactivelonelycareer

give.backhelp.others

socialiseexample

no.one.elsemeet.people

Cluster 1: 322 (23%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 2: 140 (10%)

Cluster 3: 136 (10%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 4: 166 (12%)

mind.offrecognition

networkbenefitedgood.jobchildrenservices

faithcause

communityactivelonelycareer

give.backhelp.others

socialiseexample

no.one.elsemeet.people

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 5: 160 (11%)

Cluster 6: 147 (10%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 7: 178 (13%)

Cluster 8: 166 (12%)

Friedrich Leisch: Cluster Visualization, 10.7.2009 9

Page 13: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Sort Variableshe

lp.o

ther

s

give

.bac

k

com

mun

ity

caus

e

exam

ple

good

.job

activ

e

mee

t.peo

ple

soci

alis

e

min

d.of

f

netw

ork

care

er

lone

ly

reco

gniti

on

faith

no.o

ne.e

lse

child

ren

serv

ices

bene

fited

Friedrich Leisch: Cluster Visualization, 10.7.2009 10

Page 14: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Sort Variables

help.othersgive.back

communitycause

examplegood.job

activemeet.people

socialisemind.offnetwork

careerlonely

recognitionfaith

no.one.elsechildrenservices

benefited

Cluster 1: 322 (23%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 2: 140 (10%)

Cluster 3: 136 (10%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 4: 166 (12%)

help.othersgive.back

communitycause

examplegood.job

activemeet.people

socialisemind.offnetwork

careerlonely

recognitionfaith

no.one.elsechildrenservices

benefited

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 5: 160 (11%)

Cluster 6: 147 (10%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 7: 178 (13%)

Cluster 8: 166 (12%)

Friedrich Leisch: Cluster Visualization, 10.7.2009 11

Page 15: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Highlight Important Points

help.othersgive.back

communitycause

examplegood.job

activemeet.people

socialisemind.offnetwork

careerlonely

recognitionfaith

no.one.elsechildrenservices

benefited

Cluster 1: 322 (23%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 2: 140 (10%)

Cluster 3: 136 (10%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 4: 166 (12%)

help.othersgive.back

communitycause

examplegood.job

activemeet.people

socialisemind.offnetwork

careerlonely

recognitionfaith

no.one.elsechildrenservices

benefited

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 5: 160 (11%)

Cluster 6: 147 (10%)

0.0 0.2 0.4 0.6 0.8 1.0

Cluster 7: 178 (13%)

Cluster 8: 166 (12%)

Friedrich Leisch: Cluster Visualization, 10.7.2009 12

Page 16: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Example: Car Data

A German manufacturer of premium cars asked recent customers about

main motivation to buy one of their cars. Binary data, respondents were

asked to check properties like sporty, power, interior, safety, quality,

resale value, etc.

Exploration of the data shows no natural groups, a segmentation of the

market imposes a partition on the data (which is absolutely fine for the

purpose).

Here: hierarchical clustering of variables, partition with 4 clusters from

neural gas algorithm for customers.

Friedrich Leisch: Cluster Visualization, 10.7.2009 13

Page 17: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Example: Car Data

driv

ing_

prop

ertie

s

pow

er

spor

ty

econ

omy

cons

umpt

ion re

sale

_val

ue

serv

ice

clar

ity

char

acte

r

spac

e

hand

ling

conc

ept

inte

rior

styl

ing

mod

el_c

ontin

uity

repu

tatio

n

safe

ty

qual

ity

relia

bilit

y

tech

nolo

gy

com

fort

Friedrich Leisch: Cluster Visualization, 10.7.2009 14

Page 18: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Example: Car Data

driving_propertiespowersporty

economyconsumptionresale_value

serviceclarity

characterspace

handlingconceptinteriorstyling

model_continuityreputation

safetyquality

reliabilitytechnology

comfort

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Cluster 1: 237 (30%)

0.0 0.2 0.4 0.6 0.8 1.0

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Cluster 2: 280 (35%)

driving_propertiespowersporty

economyconsumptionresale_value

serviceclarity

characterspace

handlingconceptinteriorstyling

model_continuityreputation

safetyquality

reliabilitytechnology

comfort

0.0 0.2 0.4 0.6 0.8 1.0

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Cluster 3: 205 (26%)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Cluster 4: 71 (9%)

Friedrich Leisch: Cluster Visualization, 10.7.2009 15

Page 19: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

PCA Projection

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●clarity

economy

driving_properties

serviceinterior

quality

technology

model_continuity

comfortreliability

handling reputation

concept

character

power

resale_value

styling

safety

sporty

consumption space

Friedrich Leisch: Cluster Visualization, 10.7.2009 16

Page 20: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

PCA Projection

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

quality

technology comfort

power

resale_valueconsumption

Friedrich Leisch: Cluster Visualization, 10.7.2009 17

Page 21: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

PCA Projection

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

12

3

4quality

technology comfort

power

resale_valueconsumption

Friedrich Leisch: Cluster Visualization, 10.7.2009 18

Page 22: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

PCA Projection

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

12

3

4quality

technology comfort

power

resale_valueconsumption

Friedrich Leisch: Cluster Visualization, 10.7.2009 19

Page 23: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

PCA Projection

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

12

3

4quality

technology comfort

power

resale_valueconsumption

Friedrich Leisch: Cluster Visualization, 10.7.2009 20

Page 24: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

PCA Projection

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

12

3

4quality

technology comfort

power

resale_valueconsumption

Friedrich Leisch: Cluster Visualization, 10.7.2009 21

Page 25: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

PCA Projection

12

3

4quality

technology comfort

power

resale_valueconsumption

Friedrich Leisch: Cluster Visualization, 10.7.2009 22

Page 26: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Convex Cluster Hulls

The main problem for 2-dimensional generalizations of boxplots is that

there is no total ordering of R2 (or higher).

For data partitioned using a centroid-based cluster algorithm there is a

natural total ordering for each point in a cluster: The distance d(x, c(x))

of the point to its respective cluster centroid. Let Ak be the set of

points in cluster k, and

mk = median{d(xn, ck)|xn ∈ Ak}

inner area: all data points where d(xn, ck) ≤ mk

outer area: all data points where d(xn, ck) ≤ 2.5mk

Friedrich Leisch: Cluster Visualization, 10.7.2009 23

Page 27: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Shadow Values

Second-closest centroid to x:

c̃(x) = argminc∈CK\{c(x)}

d(x, c)

Shadow value:

s(x) =2d(x, c(x))

d(x, c(x)) + d(x, c̃(x))∈ [0, 1]

s(x) = 0: centroid

s(x) = 1: on border of clusters

Friedrich Leisch: Cluster Visualization, 10.7.2009 24

Page 28: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Neighborhood Graph

Let

Aij ={xn | c(xn) = ci, c̃(xn) = cj

}be the set of all points where ci is the closest centroid and cj is second-

closest.

Cluster similarity:

sij =

{|Ai|−1 ∑

x∈|Aij| s(x), Aij 6= ∅0, Aij = ∅

Use sij + sji for thickness of line connecting ci and cj.

Friedrich Leisch: Cluster Visualization, 10.7.2009 25

Page 29: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Neighborhood Graph

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●●

●●

●●

1

2

3

45 6

7

8

Friedrich Leisch: Cluster Visualization, 10.7.2009 26

Page 30: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Neighborhood Graph

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

1

2

3

45

6

7

89

10

11

1213

14

15

16

Friedrich Leisch: Cluster Visualization, 10.7.2009 27

Page 31: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Volunteer PCA-Projection

1

2

3 4

5

67

8

example

socialise

help.othersgive.back

active

communitycause

network

Friedrich Leisch: Cluster Visualization, 10.7.2009 28

Page 32: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Volunteer PCA-Projection

●●

●●

●1

2

34

5

6

7

8

Friedrich Leisch: Cluster Visualization, 10.7.2009 29

Page 33: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Volunteer PCA-Projection

●●

●●

●1

2

34

5

6

7

8

Friedrich Leisch: Cluster Visualization, 10.7.2009 30

Page 34: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Nonlinear Arrangement

k1

k2

k3

k4

k5

k6

k7

k8

Friedrich Leisch: Cluster Visualization, 10.7.2009 31

Page 35: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Cluster Size

k1

k2

k3

k4

k5

k6

k7

k8

Friedrich Leisch: Cluster Visualization, 10.7.2009 32

Page 36: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Age Distribution

●●●●●●●●●●

●●

●●●●

●●●

●●

Friedrich Leisch: Cluster Visualization, 10.7.2009 33

Page 37: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Gender Distribution

Friedrich Leisch: Cluster Visualization, 10.7.2009 34

Page 38: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Nice to Look at

Friedrich Leisch: Cluster Visualization, 10.7.2009 35

Page 39: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Model-based Ordinal Clustering

Survey data for two fastfood chains (McDonald’s, Subway) from 715

respondents on 10 items (yummy, fattening, greasy, fast, . . . ).

We are interested in capturing scale usage heterogeneity under the

assumption that different involvement with a brand may provoke different

scale usage.

Finite mixture model: For each item and group we estimate mean and

standard deviation of a latent Gaussian. Assumption of independence

between items given group membership, estimation by EM.

For a 3-component model we get 30 means and 30 standard deviations.

Friedrich Leisch: Cluster Visualization, 10.7.2009 36

Page 40: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Model-based Ordinal Clustering

−2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

x

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

−2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

x

dnor

m(x

, mea

n =

m[n

], sd

= s

[n])

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

−2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

x

dnor

m(x

, mea

n =

m[n

], sd

= s

[n])

−3 −2 −1 0 1 2 30.

00.

10.

20.

30.

4

Friedrich Leisch: Cluster Visualization, 10.7.2009 37

Page 41: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Subway: 2 Componentspr

ob

0.0

0.2

0.4

0.6

0.8

yummy1

fattening1

greasy1

fast1

cheap1

tasty1

healthy1

disgusting1

convenient1

spicy1

0 1 2 3 4 5 6

yummy2

0 1 2 3 4 5 6

fattening2

0 1 2 3 4 5 6

greasy2

0 1 2 3 4 5 6

fast2

0 1 2 3 4 5 6

cheap2

0 1 2 3 4 5 6

tasty2

0 1 2 3 4 5 6

healthy2

0 1 2 3 4 5 6

disgusting2

0 1 2 3 4 5 6

convenient2

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

spicy2

Friedrich Leisch: Cluster Visualization, 10.7.2009 38

Page 42: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Subway: 2 Components

prob

spicy

convenient

disgusting

healthy

tasty

cheap

fast

greasy

fattening

yummy

0.0 0.2 0.4 0.6 0.8 1.0

1

0.0 0.2 0.4 0.6 0.8 1.0

2

Friedrich Leisch: Cluster Visualization, 10.7.2009 39

Page 43: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Subway: 2 Components

Overall Scale Usage

prob

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6

1

0 1 2 3 4 5 6

2

Friedrich Leisch: Cluster Visualization, 10.7.2009 40

Page 44: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Subway: 2 Components

Scale Usage Corrected

prob

rat

io 0

1

2

3

4

5

6yummy

1fattening

1greasy

1fast1

cheap1

tasty1

healthy1

disgusting1

convenient1

spicy1

0 1 2 3 4 5 6

yummy2

0 1 2 3 4 5 6

fattening2

0 1 2 3 4 5 6

greasy2

0 1 2 3 4 5 6

fast2

0 1 2 3 4 5 6

cheap2

0 1 2 3 4 5 6

tasty2

0 1 2 3 4 5 6

healthy2

0 1 2 3 4 5 6

disgusting2

0 1 2 3 4 5 6

convenient2

0 1 2 3 4 5 6

0

1

2

3

4

5

6spicy

2

Friedrich Leisch: Cluster Visualization, 10.7.2009 41

Page 45: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

McDonald’s: 3 Components

prob

spicy

convenient

disgusting

healthy

tasty

cheap

fast

greasy

fattening

yummy

0.0 0.2 0.4 0.6 0.8 1.0

1

0.0 0.2 0.4 0.6 0.8 1.0

2

0.0 0.2 0.4 0.6 0.8 1.0

3

Friedrich Leisch: Cluster Visualization, 10.7.2009 42

Page 46: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

McDonald’s: 3 Components

Overall Scale Usage

prob

0.0

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6

1

0 1 2 3 4 5 6

2

0 1 2 3 4 5 6

3

Friedrich Leisch: Cluster Visualization, 10.7.2009 43

Page 47: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

McDonald’s: 3 Components

Scale Usage Corrected

prob

rat

io

0

1

2

3

4

5yummy

1fattening

1greasy

1fast1

cheap1

tasty1

healthy1

disgusting1

convenient1

spicy1

yummy2

fattening2

greasy2

fast2

cheap2

tasty2

healthy2

disgusting2

convenient2

0

1

2

3

4

5spicy

2

0

1

2

3

4

5

0 1 2 3 4 5 6

yummy3

0 1 2 3 4 5 6

fattening3

0 1 2 3 4 5 6

greasy3

0 1 2 3 4 5 6

fast3

0 1 2 3 4 5 6

cheap3

0 1 2 3 4 5 6

tasty3

0 1 2 3 4 5 6

healthy3

0 1 2 3 4 5 6

disgusting3

0 1 2 3 4 5 6

convenient3

0 1 2 3 4 5 6

spicy3

Friedrich Leisch: Cluster Visualization, 10.7.2009 44

Page 48: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Crosstabulation of Clusters

−2.93

−2.00

0.00

2.00

4.00

5.87

Pearsonresiduals:

p−value =7.7721e−11

32

1

1 2

Only a very small group

has extreme response style

for both brands, otherwise

almost independence.

Friedrich Leisch: Cluster Visualization, 10.7.2009 45

Page 49: Visualizing Cluster Results Using Package FlexClust and ... · Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes,

Software, Papers, Outlook

Packages are available from CRAN and/or R-Forge:

flexclust: KCCA clustering for arbitrary distances, shaded barcharts,projections, convex hulls, static neighborhood graphs, . . .

gcExplorer: Interactive neighborhood graphs with links to Gene Ontol-ogy.

symbols: Grid versions of boxplot(), symbols(), stars(), . . .flexmix: Finite mixture models. Model based clustering for various

distributions and mixtures of generalized linear models.

Public availability of features shown in this talk depends on releaseversion of packages.

Papers available at

http://www.statistik.lmu.de/~leisch

Big 2do: Redo most with iPlots Extreme . . .

Friedrich Leisch: Cluster Visualization, 10.7.2009 46