digging into the dirichlet distribution by max sklar

109
Digging into the Dirichlet Max Sklar @maxsklar New York Machine Learning Meetup December 19th, 2013

Upload: hakka-labs

Post on 10-May-2015

4.971 views

Category:

Technology


22 download

DESCRIPTION

When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous. For example if there are 2 possible user-generated ratings (like and dislike), then each item is represented as a vector of 2 counts. In a higher dimensional case, each document may be expressed as a count of words, and the vector size is large enough to encompass all the important words in that corpus of documents. The Dirichlet distribution is one of the basic probability distributions for describing this type of data. The Dirichlet distribution is surprisingly expressive on its own, but it can also be used as a building block for even more powerful and deep models such as mixtures and topic models.

TRANSCRIPT

Page 1: Digging into the Dirichlet Distribution by Max Sklar

Digging into the DirichletMax Sklar

@maxsklar

New York Machine Learning MeetupDecember 19th, 2013

Page 2: Digging into the Dirichlet Distribution by Max Sklar

Dedication

Meyer Marks1925 - 2013

Page 3: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

Page 4: Digging into the Dirichlet Distribution by Max Sklar

Let’s start with something simpler

Page 5: Digging into the Dirichlet Distribution by Max Sklar

Let’s start with something simpler

A Pie Chart!

Page 6: Digging into the Dirichlet Distribution by Max Sklar

Let’s start with something simpler

A Pie Chart!

AKADiscrete DistributionMultinomial Distribution

Page 7: Digging into the Dirichlet Distribution by Max Sklar

Let’s start with something simpler

A Pie Chart!

K = The number of categories.

K = 5

Page 8: Digging into the Dirichlet Distribution by Max Sklar

Examples of Multinomial Distributions

Page 9: Digging into the Dirichlet Distribution by Max Sklar

Examples of Multinomial Distributions

Page 10: Digging into the Dirichlet Distribution by Max Sklar

Examples of Multinomial Distributions

Page 11: Digging into the Dirichlet Distribution by Max Sklar

Examples of Multinomial Distributions

Page 12: Digging into the Dirichlet Distribution by Max Sklar

Examples of Multinomial Distributions

Page 13: Digging into the Dirichlet Distribution by Max Sklar

What does the raw data look like?

Page 14: Digging into the Dirichlet Distribution by Max Sklar

What does the raw data look like?id # likes # dislikes

1 231 23

2 81 40

3 67 9

4 121 14

5 9 31

6 18 0

7 1 1

Counts!

Page 15: Digging into the Dirichlet Distribution by Max Sklar

What does the raw data look like?id # likes # dislikes

1 231 23

2 81 40

3 67 9

4 121 14

5 9 31

6 18 0

7 1 1

More specifically:- K columns of counts- N rows of data

Page 16: Digging into the Dirichlet Distribution by Max Sklar

BUT...

Counts != Multinomial Distribution

Page 17: Digging into the Dirichlet Distribution by Max Sklar

BUT...366 181 203

We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate

Page 18: Digging into the Dirichlet Distribution by Max Sklar

BUT...366 181 203

We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate

Sum = 366 + 181 + 203 = 750

Page 19: Digging into the Dirichlet Distribution by Max Sklar

BUT...366 181 203

We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate

366 / 750181 / 750203 / 750

Page 20: Digging into the Dirichlet Distribution by Max Sklar

BUT...366 181 203

We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate

48.8%24.1%27.1%

Page 21: Digging into the Dirichlet Distribution by Max Sklar

BUT...366 181 203

1 2 1

Uh Oh

Page 22: Digging into the Dirichlet Distribution by Max Sklar

BUT...366 181 203

1 2 1

0 1 0This column will be all Yellow right?

Page 23: Digging into the Dirichlet Distribution by Max Sklar

BUT...366 181 203

1 2 1

0 1 0

0 0 0Panic!!!!

Page 24: Digging into the Dirichlet Distribution by Max Sklar

Bayesian Statistics to the Rescue

Page 25: Digging into the Dirichlet Distribution by Max Sklar

Bayesian Statistics to the RescueStill assume each row was generated by a multinomial distribution

Page 26: Digging into the Dirichlet Distribution by Max Sklar

Bayesian Statistics to the RescueStill assume each row was generated by a multinomial distribution

We just don’t know which one!

Page 27: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

Is a probability distribution over all possible multinomial distributions, p.

Page 28: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

Represents our uncertainty over the actual distribution that created the row.

? ?

Page 29: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

p: represents a multinomial distributionalpha: the parameters of the dirichletK: the number of categories

Page 30: Digging into the Dirichlet Distribution by Max Sklar

Bayesian Updates

Page 31: Digging into the Dirichlet Distribution by Max Sklar

Bayesian Updates

Also a Dirichlet!

Page 32: Digging into the Dirichlet Distribution by Max Sklar

Bayesian Updates

Also a Dirichlet!(Conjugate Prior)

Page 33: Digging into the Dirichlet Distribution by Max Sklar

Bayesian Updates

Page 34: Digging into the Dirichlet Distribution by Max Sklar

Bayesian Updates

Page 35: Digging into the Dirichlet Distribution by Max Sklar

Bayesian Updates

+1

Page 36: Digging into the Dirichlet Distribution by Max Sklar

Why Does this Work?

Let’s look at it again.

Page 37: Digging into the Dirichlet Distribution by Max Sklar

Entropy

Page 38: Digging into the Dirichlet Distribution by Max Sklar

EntropyInformation Content

Page 39: Digging into the Dirichlet Distribution by Max Sklar

EntropyInformation Content

Energy

Page 40: Digging into the Dirichlet Distribution by Max Sklar

EntropyInformation Content

EnergyLog Likelihood

Page 41: Digging into the Dirichlet Distribution by Max Sklar

EntropyInformation Content

EnergyLog Likelihood

Page 42: Digging into the Dirichlet Distribution by Max Sklar
Page 43: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

Page 44: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

Normalizing Constant

Page 45: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

Normalizing Constant

Page 46: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

Page 47: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

Page 48: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

Page 49: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet Distribution

Linear

Page 50: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet MACHINE

1.2 3.0 0.3Prior

Page 51: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet MACHINE

1.2 3.0 0.3Prior

Page 52: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet MACHINE

2.2 3.0 0.3Update

Page 53: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet MACHINE

2.2 3.0 0.3Prior

Page 54: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet MACHINE

2.2 3.0 1.3Update

Page 55: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet MACHINE

2.2 3.0 1.3Prior

Page 56: Digging into the Dirichlet Distribution by Max Sklar

The Dirichlet MACHINE

2.2 3.0 2.3Update

Page 57: Digging into the Dirichlet Distribution by Max Sklar

Interpreting the Parameters

1 3 2

What does this alpha vector really mean?

Page 58: Digging into the Dirichlet Distribution by Max Sklar

Interpreting the Parameters1 3 2

1/6 3/6 2/6 6

normalized sum

Page 59: Digging into the Dirichlet Distribution by Max Sklar

Interpreting the Parameters1 3 2

6ExpectedValue

Page 60: Digging into the Dirichlet Distribution by Max Sklar

Interpreting the Parameters1 3 2

6ExpectedValue Weight

Page 61: Digging into the Dirichlet Distribution by Max Sklar

ANALOGY: Normal Distribution

Precision =1 / variance

Page 62: Digging into the Dirichlet Distribution by Max Sklar

ANALOGY: Normal Distribution

High precision:data is close to the mean

Low precision:far away from the mean

Page 63: Digging into the Dirichlet Distribution by Max Sklar

Interpreting the Parameters1 3 2

6ExpectedValue Precision

Page 65: Digging into the Dirichlet Distribution by Max Sklar

High Weight Dirichlet0.4 0.4 0.2

Page 66: Digging into the Dirichlet Distribution by Max Sklar

Low Weight Dirichlet0.4 0.4 0.2

Page 67: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn

Page 68: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn

Page 69: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn

Page 70: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn

Page 71: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn

Page 72: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

Rich get richer...

Page 73: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

Rich get richer...

Page 74: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

Finally yellow catches a break

Page 75: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

Finally yellow catches a break

Page 76: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

But it’s too late...

Page 77: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

As the urn gets more populated, the distribution gets “stuck” in place.

Page 78: Digging into the Dirichlet Distribution by Max Sklar

Urn Model

Once lots of data has been collected, or the dirichlet has high precision, it’s hard to overturn that with new data

Page 79: Digging into the Dirichlet Distribution by Max Sklar

Chinese Restaurant Process

When you find the white ball, throw a new color into the mix.

Page 80: Digging into the Dirichlet Distribution by Max Sklar

Chinese Restaurant Process

When you find the white ball, throw a new color into the mix.

Page 81: Digging into the Dirichlet Distribution by Max Sklar

Chinese Restaurant Process

When you find the white ball, throw a new color into the mix.

Page 82: Digging into the Dirichlet Distribution by Max Sklar

Chinese Restaurant Process

When you find the white ball, throw a new color into the mix.

Page 83: Digging into the Dirichlet Distribution by Max Sklar

Chinese Restaurant Process

When you find the white ball, throw a new color into the mix.

Page 84: Digging into the Dirichlet Distribution by Max Sklar

Chinese Restaurant Process

When you find the white ball, throw a new color into the mix.

Page 85: Digging into the Dirichlet Distribution by Max Sklar

Chinese Restaurant Process

When you find the white ball, throw a new color into the mix.

Page 86: Digging into the Dirichlet Distribution by Max Sklar

Chinese Restaurant Process

When you find the white ball, throw a new color into the mix.

Page 87: Digging into the Dirichlet Distribution by Max Sklar

Chinese Restaurant ProcessThe expected infinite distribution (mean) is exponential.

# of white balls controls the exponent

Page 88: Digging into the Dirichlet Distribution by Max Sklar

So coming back to the count data:

What Dirichlet parameters explain the data?

20 0 0

2 1 17

14 6 0

15 5 0

0 20 0

0 14 6

Page 89: Digging into the Dirichlet Distribution by Max Sklar

So coming back to the count data:

Newton’s Method:Requires Gradient+ Hessian

20 0 0

2 1 17

14 6 0

15 5 0

0 20 0

0 14 6

Page 90: Digging into the Dirichlet Distribution by Max Sklar

So coming back to the count data:

Reads all of the data...

20 0 0

2 1 17

14 6 0

15 5 0

0 20 0

0 14 6

Page 91: Digging into the Dirichlet Distribution by Max Sklar

So coming back to the count data:

https://github.com/maxsklar/BayesPy/tree/master/ConjugatePriorTools

20 0 0

2 1 17

14 6 0

15 5 0

0 20 0

0 14 6

Page 92: Digging into the Dirichlet Distribution by Max Sklar

So coming back to the count data:

Compress the data into a Matrix and a Vector:

Works for lots of sparsely populated rows

20 0 0

2 1 17

14 6 0

15 5 0

0 20 0

0 14 6

Page 93: Digging into the Dirichlet Distribution by Max Sklar

The Compression MACHINE

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

Page 94: Digging into the Dirichlet Distribution by Max Sklar

The Compression MACHINE

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

K = 3 (the 4th row is a special, total row)

M = 6The maximum # samples per input

Page 95: Digging into the Dirichlet Distribution by Max Sklar

The Compression MACHINE

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

K = 3 (the 4th row is a special, total row)

M = 6The maximum # samples per input

1 0 0

Page 96: Digging into the Dirichlet Distribution by Max Sklar

The Compression MACHINE

1 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

1 0 0 0 0 0

K = 3 (the 4th row is a special, total row)

M = 6The maximum # samples per input

Page 97: Digging into the Dirichlet Distribution by Max Sklar

The Compression MACHINE

1 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

1 0 0 0 0 0

K = 3 (the 4th row is a special, total row)

M = 6The maximum # samples per input

1 3 2

Page 98: Digging into the Dirichlet Distribution by Max Sklar

The Compression MACHINE

2 0 0 0 0 0

1 1 1 0 0 0

1 1 0 0 0 0

2 1 1 1 1 1

K = 3 (the 4th row is a special, total row)

M = 6The maximum # samples per input

Page 99: Digging into the Dirichlet Distribution by Max Sklar

The Compression MACHINE

2 0 0 0 0 0

1 1 1 0 0 0

1 1 0 0 0 0

2 1 1 1 1 1

K = 3 (the 4th row is a special, total row)

M = 6The maximum # samples per input

0 6 0

Page 100: Digging into the Dirichlet Distribution by Max Sklar

The Compression MACHINE

2 0 0 0 0 0

2 2 2 1 1 1

1 1 0 0 0 0

3 2 2 2 2 2

K = 3 (the 4th row is a special, total row)

M = 6The maximum # samples per input

Page 101: Digging into the Dirichlet Distribution by Max Sklar

The Compression MACHINE

2 0 0 0 0 0

2 2 2 1 1 1

1 1 0 0 0 0

3 2 2 2 2 2

K = 3 (the 4th row is a special, total row)

M = 6The maximum # samples per input

2 1 1

Page 102: Digging into the Dirichlet Distribution by Max Sklar

The Compression MACHINE

3 1 0 0 0 0

3 2 2 1 1 1

2 1 0 0 0 0

4 3 3 3 2 2

K = 3 (the 4th row is a special, total row)

M = 6The maximum # samples per input

Page 103: Digging into the Dirichlet Distribution by Max Sklar

DEMO

Page 104: Digging into the Dirichlet Distribution by Max Sklar

Our Popularity Prior

Page 105: Digging into the Dirichlet Distribution by Max Sklar

Dirichlet Mixture Models

Anything you can go with a Gaussian, you can also do with a Dirichlet

Page 106: Digging into the Dirichlet Distribution by Max Sklar

Dirichlet Mixture Models

Example:Mixture of Gaussians usingExpectation-Maximization

Page 107: Digging into the Dirichlet Distribution by Max Sklar

Dirichlet Mixture ModelsAssume each row is a mixture of multinomials.

And the parameters of that mixture are pulled from a Dirichlet.

?

Page 108: Digging into the Dirichlet Distribution by Max Sklar

Dirichlet Mixture Models

Latent Dirichlet Allocation

Topic Model

Page 109: Digging into the Dirichlet Distribution by Max Sklar

Questions