statistical clustering

45
UNCLASSIFIED Statistical Clustering: k- means, Gaussian Mixtures, Variational Inference 22-FEB-2012

Upload: tushar-tank

Post on 11-Feb-2017

326 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Statistical Clustering

UNCL

ASSI

FIED

Statistical Clustering: k-means, Gaussian Mixtures, Variational Inference

22-FEB-2012

Page 2: Statistical Clustering

UNCL

ASSI

FIED

What is Clustering?

22FE

B12

2 Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

Design Considerations• Features• Dimension• Model: Distance / Cost• Bias / Variance

Page 3: Statistical Clustering

UNCL

ASSI

FIED

Why do we care?

22FE

B12

3 Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

Page 4: Statistical Clustering

UNCL

ASSI

FIED

Scope of Talk – Main Take Away Point

22FE

B12

4 Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

It’s all About the Posterior

K-meansHow does it workMath behind itIssues

GMMHow does it workMath behind itIssues

VariationalJust the facts

Variational InferenceGMM, EM, (Graph Cuts, Spectral Clustering)K-means, vector quantization

Page 5: Statistical Clustering

UNCL

ASSI

FIED

Scope of Talk

22FE

B12

5 Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

Main Take Away PointIt’s all Just Posterior EstimationVariational / MCNCGMMK-means / vector quantization

K-meansHow does it workMath behind itIssues

GMMHow does it workMath behind itIssues

VariationalJust the factsPlease interrupt

and ask

questions

Page 6: Statistical Clustering

UNCL

ASSI

FIED

K-means – How it works

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

6

Goal: represent a data set in terms of K clusters each of which is summarized by a prototype Iterative Two step process:E-step: assign each data point to nearest prototypeM-step: update prototype to be the cluster meansSimple version: Euclidean distance, requires whitening

Design Considerations• Features• Dimension• Model: Distance / Cost• Bias / Variance

Page 7: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

7

Page 8: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

8

Page 9: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

9

Page 10: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

10

Page 11: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

11

Page 12: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

12

Page 13: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

13

Page 14: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

14

Page 15: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

15

Converged

Page 16: Statistical Clustering

UNCL

ASSI

FIED

k-means - Math Responsibilities – assign data to cluster

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

16

Cost Function

example

Page 17: Statistical Clustering

UNCL

ASSI

FIED

Minimizing the Cost Function

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

17

Page 18: Statistical Clustering

UNCL

ASSI

FIED

What can go wrong?

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

18

Page 19: Statistical Clustering

UNCL

ASSI

FIED

What can go wrong? A great deal. How do we choose K? (gap statistic / prediction strength) How do we initialize? (k++ seems to be the best) Local minimums – run hundreds of time with different

initializations Are we overfitting? Probably. But hey – it simple to understand and does not cost too

many cycles

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

19

Page 20: Statistical Clustering

UNCL

ASSI

FIED

Quick word on distances (k-medioids)

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

20

MahalanobisNot dependent on scale of measurementTuning parameter

Manhattan / City BlockDampens outliers

EuclideanNeed to whitenOutliers are an issue

Page 21: Statistical Clustering

UNCL

ASSI

FIED

Exclusive Clustering: k-means, weighted k-means Overlapping Clustering: fuzzy c-means, Nonlinear Clustering: kernel k-means (spectral clustering,

normalized cuts)

Hierarchical Clustering: Hierarchical

Quicker word on flavors

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

21

Page 22: Statistical Clustering

UNCL

ASSI

FIED

Probabilistic Clustering Represent the probability distribution of the data as a

mixture model Captures uncertainty in cluster assignments Gives model for data distribution Bayesian mixture – we can figure out K easier

Consider a mixture of Gaussians

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

22

Page 23: Statistical Clustering

UNCL

ASSI

FIED

Multivariate Gaussian Distribution Review

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

23

Page 24: Statistical Clustering

UNCL

ASSI

FIED

Likelihood Function

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

24

Maximum LikelihoodWhat is the best fit to my dataApproximation of Posterior!

Page 25: Statistical Clustering

UNCL

ASSI

FIED

Maximum Likelihood Solution for One Gaussian Sample mean

Sample Covariance

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

25

Page 26: Statistical Clustering

UNCL

ASSI

FIED

Gaussian Mixtures Linear super-position of Gaussians

Normalization and positivity require

Can interpret mixing coefficients as prior probabilities

[Aside]We can sample from this. Given mixing coeff, mean, variance – get a sample from p(x) – our dataset.22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

26

Page 27: Statistical Clustering

UNCL

ASSI

FIED

Fitting the Gaussian Mixture We wish to invert this sampling process – given the data,

find the corresponding parameters (like we did for the single Gaussian case) Mixing coefficients Means Covariances

If we knew which data point “belonged” or was the responsibility of which Gaussian, then we could use our single Gaussian ML solution

Problem: We don’t have labels, this complicates things. Solution: Create a latent or hidden variable (z) that tells

us which data point goes with which Gaussian22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

27

Page 28: Statistical Clustering

UNCL

ASSI

FIED

Posterior of latent variable Or more concretely the probability that the data point

was generated by the Gaussian with no prior knowledge of .

Or more concretely the probability that the data point was generated by the Gaussian after observing

Also called responsiblities

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

28

Page 29: Statistical Clustering

UNCL

ASSI

FIED

Maximum Likelihood for GMM The log likelihood takes this form

Notice that the sum inside the log, no closed form solution.

Solve by expectation-maximization (EM) algorithm Derivative w.r.t

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

29

Page 30: Statistical Clustering

UNCL

ASSI

FIED

EM – notice each one of these is dependent on responsiblities Do the Same for Covariance

Use Lagrange Multiplier for mixing coefficients

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

30

Page 31: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

31

Page 32: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

32

Page 33: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

33

Page 34: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

34

Page 35: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

35

Page 36: Statistical Clustering

UNCL

ASSI

FIED

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

36

Page 37: Statistical Clustering

UNCL

ASSI

FIED

Relation to k-means

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

37

Page 38: Statistical Clustering

UNCL

ASSI

FIED

Fast food example

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

38

http://nutrition.mcdonalds.com/nutritionexchange/nutritionfacts.pdf

Page 39: Statistical Clustering

UNCL

ASSI

FIED

Dessert Cluster

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

39

Caramel MochaFrappe CaramelIced Hazelnut LatteIced CoffeeStrawberry Triple Thick Shake

Snack Size McFlurryHot Caramel SundaeBaked Hot Apple PieCinnamon MeltsKiddie ConeStrawberry Sundae

Page 40: Statistical Clustering

UNCL

ASSI

FIED

Burger – like cluster

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

40

HamburgerCheeseburgerFilet-O-FishQuarter Pounder with CheesePremium Grilled Chicken Club Sandwich

Ranch Snack WrapPremium Asian Salad with Crispy ChickenButter Garlic CroutonsSausage McMuffinSausage McGriddles

Page 41: Statistical Clustering

UNCL

ASSI

FIED

Salad Cluster

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

41

Premium Southwest Salad with Grilled ChickenPremium Caesar Salad with Grilled ChickenSide SaladPremium Asian Salad without ChickenPremium Bacon Ranch Salad without Chicken

Page 42: Statistical Clustering

UNCL

ASSI

FIED

Sauces Cluster 2 /6

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

42

Hot Mustard SauceSpicy Buffalo SauceNewman’s Own Low Fat Balsamic Vinaigrette

Ketchup PacketBarbeque SauceChipotle Barbeque Sauce

Page 43: Statistical Clustering

UNCL

ASSI

FIED

Creamy Sauces

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

43

Creamy Ranch SauceNewman’s Own Creamy Caesar DressingCoffee CreamIced Coffee with Sugar Free Vanilla Syrup

Page 44: Statistical Clustering

UNCL

ASSI

FIED

Oatmeal and Apples on their own

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

44

Page 45: Statistical Clustering

UNCL

ASSI

FIED

Breakfast artery clogging cluster

22FE

B12

Notice: Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

45

Sausage McMuffin with EggSausage BurritoEgg McMuffinBacon, Egg & Chees BiscuitMcSkillet Burrito with SausageBig Breakfast with Hotcakes