seven (plus or minus two) clusters, a monte carlo study larry hoyle, policy research institute, the...

43
Seven (plus or minus two) Clusters, A Monte Carlo Study Larry Hoyle, Policy Research Institute, The University of Kansas

Post on 22-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Seven (plus or minus two) Clusters, A Monte Carlo Study Larry Hoyle, Policy Research Institute, The University of Kansas
  • Slide 2
  • 1972 Kansas Statistical Abstract
  • Slide 3
  • Shading by Overprinting
  • Slide 4
  • Shading by Line Spacing
  • Slide 5
  • Line Shading Detail
  • Slide 6
  • What did they have in common? Neither method is continuous So both methods required grouping or classes Fixed number of combinations Characters on a fixed grid Integer number of lines in the polygon Lines are relatively coarse
  • Slide 7
  • How to Group for Shading Equal Intervals Equal numbers (quantiles) By clusters Dont group (unclassed)
  • Slide 8
  • Population Density 7 Equal Intervals 100 counties fall into the bottom class
  • Slide 9
  • Population Density - Equal Numbers 15 counties in each class - a very different picture
  • Slide 10
  • Population Density - Cluster Means Group around the 7 values that best represent the data
  • Slide 11
  • Population Density - Unclassed No classes, just shade in proportion to value
  • Slide 12
  • Clustering Tries for Best grouping Each member of cluster can be represented by the mean of the group
  • Slide 13
  • Proc Fastclus You specify the number of clusters Minimizes cluster sum of squared distance (e.g. minimum within cluster variance) inspired by: k-means (MacQueen) leader algorithm (Hartigan)
  • Slide 14
  • Example clustering - data
  • Slide 15
  • 4 clusters y cluster data. x 0102030405060708090 R-squared=.9912
  • Slide 16
  • 4 clusters data Correlation.9956 R-squared=.9912
  • Slide 17
  • 3 clusters y cluster data. x 0102030405060708090 R-squared=.9609
  • Slide 18
  • How many clusters is enough?
  • Slide 19
  • Plot R-squared by number of clusters Sample of 300 observations, Uniform distribution, 11 cluster analyses
  • Slide 20
  • What happens if there really arent any clusters? Lets try 500 samples
  • Slide 21
  • Uniform, 300 obs. per sample 500 samples, 11 clusterings each
  • Slide 22
  • Uniform, 1000 obs. per sample 500 samples, 11 clusterings each
  • Slide 23
  • Normal, 300 obs. per sample 500 samples, 11 clusterings each
  • Slide 24
  • Normal, 1000 obs. per sample 500 samples, 11 clusterings each
  • Slide 25
  • Exponential, 300 obs. per sample 500 samples, 11 clusterings each
  • Slide 26
  • Distribution of worst sample
  • Slide 27
  • Exponential, 1000 obs. per sample 500 samples, 11 clusterings each
  • Slide 28
  • So Whats with 7 2?
  • Slide 29
  • Uniform, 7 2 500 samples, 11 clusterings each
  • Slide 30
  • Normal, 7 2 500 samples, 11 clusterings each
  • Slide 31
  • Exponential, 7 2 500 samples, 11 clusterings each
  • Slide 32
  • Minimum R squared by sample size and distribution At least 95% of the variance for all
  • Slide 33
  • Histograms Equal intervals Number of observations in each interval
  • Slide 34
  • Needle Plot of Cluster Means
  • Slide 35
  • Bar chart needs more bars
  • Slide 36
  • The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Information Processing George Miller, The Psychological Review 1956, vol.63 pp. 81-97
  • Slide 37
  • Limits on Categories for Absolute Judgments Pitch 6 Loudness 5 Visual position 9 Size of a square 5 Hue 8 Name the colors in this slide
  • Slide 38
  • And finally, what about the magical number seven? George A. Miller
  • Slide 39
  • Miller Quote 1 seven wonders of the world seven seas seven deadly sins seven daughters of Atlas in the Pleiades seven ages of man seven levels of hell seven primary colors seven notes of the musical scale seven days of the week What about the
  • Slide 40
  • Miller Quote 2 seven-point rating scale seven categories for absolute judgment seven objects in the span of attention seven digits in the span of immediate memory What about the
  • Slide 41
  • Perhaps there is something deep and profound behind all these sevens, something just calling out for us to discover it. Miller Quote 3
  • Slide 42
  • Miller - close But I suspect that it is only a pernicious, Pythagorean coincidence.
  • Slide 43
  • Coincidence or Natures Parsimony? Does our capacity match whats needed for 95% of the variance? 95%? Hmmmm. confidence intervals an A 19 fingers and toes 970,000 web pages Larry Hoyle Policy Research Institute University of Kansas [email protected]