22: mapweb.stanford.edu/class/cs109/lectures/22_map.pdflisa yan, cs109, 2020 map in practice •flip...

52
22: MAP Lisa Yan May 27, 2020 1

Upload: others

Post on 20-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

22: MAPLisa YanMay 27, 2020

1

Page 2: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Quick slide reference

2

3 Maximum a Posteriori Estimator 22a_map

11 Bernoulli MAP: Choosing a prior 23b_bernoulli_any_prior

15 Bernoulli MAP: Conjugate prior 22c_bernoulli_conjugate

24 Common conjugate distributions LIVE

36 Choosing hyperparameters for conjugate prior LIVE

43 Bayesian Envelope demo LIVE

Page 3: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Maximum a Posteriori Estimator

3

22a_map

Page 4: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Maximum Likelihood Estimator

4

Maximum Likelihood Estimator

(MLE)

What is the parameter !that maximizes the likelihoodof our observed data "!, "", … , "# ?

!$%& = arg max'

+ "!, "", … , "#|!

- ! = + "!, "", … , "#|!

="!"#

$# $!|&

likelihood of dataObservations:• MLE maximizes probability of observing

data given a parameter !.• If we are estimating !, shouldn’t we

maximize the probability of ! directly?

Review

Today: Bayesian estimationusing the Bayesian definition of probability!

Consider a sample of " i.i.d. random variables #', #(, … , #) (data).

Page 5: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Maximum A Posterior (MAP) Estimator

5

Maximum Likelihood Estimator

(MLE)

What is the parameter !that maximizes the likelihoodof our observed data "!, "", … , "# ?

!$%& = arg max'

+ "!, "", … , "#|!

- ! = + "!, "", … , "#|!

="!"#

$# $!|&

Maximuma Posteriori

(MAP) Estimator

Given our observed data "!, "", … , "# ,

what is the most likely parameter !?

!$() = arg max'

+ !|"!, "", … , "#

likelihood of data

posterior distributionof &

Consider a sample of " i.i.d. random variables #', #(, … , #) (data).

Page 6: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Maximum A Posterior (MAP) EstimatorConsider a sample of " i.i.d. random variables #', #(, … , #) (data).def The Maximum a Posteriori (MAP) Estimator of ! is the value of ! that

maximizes the posterior distribution of !.

6

!*+, = arg max-

, !|#', #(, … , #)

Intuition with Bayes’ Theorem:

posteriorlikelihood prior

! " data = ! data " ! "! data

- ! , probability of data given parameter !

Before seeing data,prior belief of !

After seeing data, posterior

belief of !

Page 7: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Solving for !!"#• Observe data: "!, "", … , "#, all i.i.d. • Let likelihood be same as MLE:• Let the prior distribution on ! be . ! .

7

!$() = arg max'

+ !|"!, "", … , "# = arg max'

+ "!, "", … , "#|! . !ℎ "!, "", … , "#

(Bayes’ Theorem)

= arg max'

. ! ∏*+!# + "*| !

ℎ "!, "", … , "# (independence)

= arg max'

. ! 1*+!

#+ "*| ! (1/ℎ $#, $%, … , $$ is a positive constant w.r.t. &)

= arg max'

log . ! +5*+!

#log + "*| !

# $#, $%, … , $$|& ="!"#

$# $!| &

Page 8: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

!!"#: Interpretation 1• Observe data: "!, "", … , "#, all i.i.d. • Let likelihood be same as MLE:• Let the prior distribution of ! be . ! .

8

!$() = arg max'

+ !|"!, "", … , "# = arg max'

+ "!, "", … , "#|! . !ℎ "!, "", … , "#

(Bayes’ Theorem)

= arg max'

. ! ∏*+!# + "*| !

ℎ "!, "", … , "# (independence)

= arg max'

. ! 1*+!

#+ "*| ! (1/ℎ $#, $%, … , $$ is a positive constant w.r.t. &)

= arg max'

log . ! +5*+!

#log + "*| !

# $#, $%, … , $$|& ="!"#

$# $!| &

!$() maximizes log prior + log-likelihood

Page 9: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

!!"#: Interpretation 2• Observe data: "!, "", … , "#, all i.i.d. • Let likelihood be same as MLE:• Let the prior distribution of ! be . ! .

9

!$() = arg max'

+ !|"!, "", … , "# = arg max'

+ "!, "", … , "#|! . !ℎ "!, "", … , "#

(Bayes’ Theorem)

= arg max'

. ! ∏*+!# + "*| !

ℎ "!, "", … , "# (independence)

= arg max'

. ! 1*+!

#+ "*| ! (1/ℎ $#, $%, … , $$ is a positive constant w.r.t. &)

= arg max'

log . ! +5*+!

#log + "*| !

# $#, $%, … , $$|& ="!"#

$# $!| &

The mode of theposterior distribution of !

!$() maximizes log prior + log-likelihood

Page 10: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Mode: A statistic of a random variableThe mode of a random variable # is defined as:

• Intuitively: The value of # that is “most likely.”• Note that some distributions may not have a unique mode

(e.g., Uniform distribution, or Bernoulli(0.5))

10

arg max3

. / arg max3

, /($ discrete, PMF 4 5 )

($ continuous, PDF # 5 )

!$() is the most likely !given the data "!, "", … , "#.

!*+, = arg max-

, !|#', #(, … , #)

Page 11: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Bernoulli MAP: Choosing a prior

11

22b_bernoulli_any_prior

Page 12: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

How does MAP work? (for Bernoulli)

12

Choose model Bernoulli .

Observe data

Choose prior on !

Find !$() =arg max

'+ !|"!, "", … , "#

" heads, 0 tails

(some . ! )

A lot of our effort in MAP depends on the . ! we choose.

maximizelog prior + log-likelihood

log . ! +5*+!

#log + "*|!

• Differentiate, set to 0• Solve

Page 13: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

MAP for Bernoulli• Flip a coin 8 times. Observe 6 = 7 heads and 8 = 1 tail.• Choose a prior on !. What is !$()?

Suppose we pick a prior !~2 0.5, 1( . . ! = !", :

- .-/.1 !/"

13

1. Determine log prior + log likelihood

2. Differentiatew.r.t. (each) !, set to 0

3. Solve resultingequations

We should choose an ”easier” prior. This one is hard!

log . ! + log + "!, "", … , "#|!

= log12<

:- .-/.1 !/" + log 6 +86 =# 1 − = 3

= − log 2; − 4 − 0.5 %/2 + log @ + A@

+ @ log 4 + A log 1 − 4

− = − 0.5 +6= −

81 − = = 0

cubic equations why

Page 14: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

A better approach: Use conjugate distributions

14

Choose model Bernoulli .

Observe data

Choose prior on !

Find !$() =arg max

'+ !|"!, "", … , "#

" heads, 0 tails

(some . ! ) (choose conjugatedistribution)

⭐maximize

log prior + log-likelihoodUp next: Conjugate

priors are greatfor MAP!

log . ! +5*+!

#log + "*|!

• Differentiate, set to 0• Solve

Page 15: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Bernoulli MAP: Conjugate prior

15

22c_bernoulli_conjugate

Page 16: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Beta is a conjugate distribution for BernoulliBeta is a conjugate distribution for Bernoulli, meaning:• Prior and posterior parametric forms are the same• Practically, conjugate means easy update:

Add numbers of “successes” and “failures” seen to Beta parameters.• You can set the prior to reflect how fair/biased you think the experiment is apriori.

Mode of Beta(C, D):

16

Prior

Posterior

Experiment Observe 6 successes and 8 failures

Beta(C = 6*345 + 1, D = 8*345 + 1)

Beta C = 6*345 + 6 + 1, D = 8*345 +8 + 1

C − 1C + D − 2

(we’ll prove this in a few minutes)

Review

Beta parameters B, C are called hyperparameters. Interpret Beta(B, C): B + C − 2 trials,

of which B − 1 are successes

Page 17: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

How does MAP work? (for Bernoulli)

17

Choose model Bernoulli .

Observe data

Choose prior on !

Find !$() =arg max

'+ !|"!, "", … , "#

" heads, 0 tails

(some . ! ) (choose conjugatedistribution)

maximizelog prior + log-likelihood

log . ! +5*+!

#log + "*|!

• Differentiate, set to 0• Solve

Mode of posterior distribution of !

(posterior is alsoconjugate)

Page 18: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Conjugate strategy: MAP for Bernoulli• Flip a coin 8 times. Observe 6 = 7 heads and 8 = 1 tail.• Choose a prior on !. What is !$()?

18

1. Choose a prior

2. Determine posterior

3. Compute MAP

Suppose we pick a prior !~Beta C, D .

Because Beta is a conjugate distribution for Bernoulli,the posterior distribution is !|G~Beta C + 6, D + 8

Define as data, G

!$() =C + 6 − 1

C + 6 + D +8 − 2(mode of Beta B + @, C + A )

Page 19: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

MAP in practice• Flip a coin 8 times. Observe 6 = 7 heads and 8 = 1 tail.• What is the MAP estimator of the Bernoulli parameter =,

if we assume a prior on = of Beta 2, 2 ?

19

!

Page 20: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

MAP in practice• Flip a coin 8 times. Observe 6 = 7 heads and 8 = 1 tail.• What is the MAP estimator of the Bernoulli parameter =,

if we assume a prior on = of Beta 2, 2 ?

20

1. Choose a prior

2. Determine posterior

3. Compute MAP

!~Beta 2,2 .

Posterior distribution of ! given observed data is Beta 9, 3

!$() =810

Before flipping the coin, we imagined 2 trials:1 imaginary head, 1 imaginary tail.

After the coin, we saw 10 trials: 8 heads (imaginary and real),2 tails (imaginary and real).

Page 21: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Proving the mode of Beta

21

Choose model Bernoulli .

Observe data

Choose prior on !

Find !$() =arg max

'+ !|"!, "", … , "#

" heads, 0 tails

(some . ! ) (choose conjugate)Beta C, D

These are equivalent interpretations of !$().

We’ll use this equivalence to prove the mode of Beta.

Mode of posterior distribution of !

maximizelog prior + log-likelihood

log . ! +5*+!

#log + "*|!

• Differentiate, set to 0• Solve

(posterior is alsoconjugate)

Page 22: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

From first principles: MAP for Bernoulli, conjugate prior• Flip a coin 8 times. Observe 6 = 7 heads and 8 = 1 tail.• Choose a prior on !. What is !$()?

Suppose we pick a prior !~Beta 7, 8 . . ! = !6 =

4-! 1 − = 7-!

22

1. Determine log prior + log likelihood

2. Differentiatew.r.t. (each) !, set to 0

3. Solve

log . ! + log + "!, "", … , "#|! = log1K =

4-! 1 − = 7-! + log 6 +86 =# 1 − = 3

= log1K + C − 1 log = + D − 1 log 1 − = + log 6 +8

6 + 6 log = +8 log 1 − =

C − 1= +

6= −

D − 11 − = −

81 − = = 0

(next slide)

normalizing constant, 8

Page 23: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

From first principles: MAP for Bernoulli, conjugate prior• Flip a coin 8 times. Observe 6 = 7 heads and 8 = 1 tail.• Choose a prior !. What is !$()?

Suppose we pick a prior !~Beta 7, 8 . . ! = = = !6 =

4-! 1 − = 7-!

23

3. Solve for =

normalizing constant, 8

C − 1= +

6= −

D − 11 − = −

81 − = = 0

!$() =C + 6 − 1

C + 6 + D +8 − 2The mode of the posterior,Beta B + @, C + A !

If we choose a conjugate prior, we avoid calculus with MAP: just report mode of posterior.

(from previous slide)

C + 6 − 1= −

D +8 − 11 − = = 0⟹

Page 24: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

(live)22: MAPLisa YanMay 27, 2020

24

Page 25: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

How does MAP work?

25

Choose model with parameter !

Observe data

Choose prior on !

Find !$() = arg max'

+ !|"!, "", … , "#Mode of posterior distribution of !

maximizelog prior + log-likelihood

Review

or

If we choose a conjugate prior, we avoid calculus with MAP: just report mode of posterior.

= arg max'

log . ! +5*+!

#log + "*| !

Two valid interpretations of !$()

Page 26: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Conjugate distributions

26

LIVE

Page 27: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Quick MAP for Bernoulli and BinomialBeta 7, 8 is a conjugate prior for theprobability of success in theBernoulli and Binomial distributions.

Prior Beta 7, 8Saw 7 + 8 − 2 imaginary trials: 7 − 1 successes, 8 − 1 failures

Experiment Observe " +0 new trials: " successes, 0 failures

Posterior Beta 7 + ", 8 + 0

27

MAP: = =C + 6 − 1

C + D + 6 +8 − 2

+ C =1

L C, D M4-! 1 − M 7-!

Review

Page 28: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Conjugate distributionsMAPestimator:

28

!*+, = arg max-

, !|#', #(, … , #)The mode of theposterior distribution of !

Distribution parameter Conjugate distributionBernoulli 4 BetaBinomial 4 BetaMultinomial 4! DirichletPoisson F GammaExponential F GammaNormal G NormalNormal H% Inverse Gamma

Don’t need to know Inverse Gamma…but it will know you J

CS109: We’ll only focus on MAP for Bernoulli/Binomial =, Multinomial =*, and Poisson N.

Page 29: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Multinomial is Multiple times the funDirichlet 7', 7(, … , 7I is a conjugate for Multinomial.• Generalizes Beta in the

same way Multinomialgeneralizes Bernoulli/Binomial:

Prior Dirichlet 7', 7(, … , 7ISaw ∑JK'

I 7J −0 imaginary trials, with 7J − 1 of outcome >

Experiment Observe "' + "( +⋯+ "I new trials, with "J of outcome >

Posterior Dirichlet 7' + "', 7( + "(, … , 7I + "I

29

MAP: =* =C* + 6* − 1

∑*+!3 C* + ∑*+!3 6* −8

+ M!, M", … , M3 =1

L C!, C", … , C31*+!

3M*4"-!

Page 30: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Good times with GammaGamma @, A is a conjugate for Poisson.• Also conjugate for Exponential,

but we won’t delve into that• Mode of gamma: @ − 1 /A

Prior !~Gamma @, ASaw @ − 1 total imaginary events during A prior time periods

Experiment Observe " events during next C time periods

Posterior !|" events in C periods ~Gamma @ + ", A + C

30

!$() =C + 6 − 1K + P

MAP:

Page 31: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

MAP for PoissonLet D be the average # of successes in a time period.

1. What does it mean to havea prior of !~Gamma 11,5 ?

Now perform the experiment and see 11 events in next 2 time periods.2. Given your prior, what is the

posterior distribution?

3. What is !*+,?

31

Gamma L, Mis conjugate for Poisson

Observe 10 imaginary eventsin 5 time periods,i.e., observe at Poisson rate = 2

!

Mode: &'#(

Page 32: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

MAP for PoissonLet D be the average # of successes in a time period.

1. What does it mean to havea prior of !~Gamma 11,5 ?

Now perform the experiment and see 11 events in next 2 time periods.2. Given your prior, what is the

posterior distribution?

3. What is !*+,?

32

Gamma L, Mis conjugate for Poisson Mode: &'#(

Observe 10 imaginary eventsin 5 time periods,i.e., observe at Poisson rate = 2

!|6 events in P periods ~Gamma 22, 7

!$() = 3, the updated Poisson rate

Page 33: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Interlude for jokes/announcements

33

Page 34: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Announcements

34

Problem Set 6: No late days or on-time bonus

Quiz 2 Errata

Question 1(d) full credit for everyone

Final course grade progress

To be released after class

Page 35: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Interesting probability news

35

CS109 Current Events Spreadsheet

What Role Should Employers Play in Testing Workers?

https://www.nytimes.com/2020/05/22/business/employers-coronavirus-testing.html

One nascent strategy circulating among public health experts is running “pooled” coronavirus tests, in which a workplace could combine multiple saliva or nasal swabs into one larger sample representing dozens of employees.

Remember Quiz 1??

Page 36: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Choosing hyperparameters for conjugate prior

36

LIVE

Page 37: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Where’d you get them priors?• Let ! be the probability a coin turns up heads.• Model ! with 2 different priors:◦ Prior 1: Beta(3,8): 2 imaginary heads,

7 imaginary tails◦ Prior 2: Beta(7,4): 6 imaginary heads,

3 imaginary tails

Now flip 100 coins and get 58 heads and 42 tails.1. What are the two posterior distributions?2. What are the modes of the two posterior distributions?

37

mode: "9

mode: :9

prior

!

Page 38: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Where’d you get them priors?• Let ! be the probability a coin turns up heads.• Model ! with 2 different priors:◦ Prior 1: Beta(3,8): 2 imaginary heads,

7 imaginary tails◦ Prior 2: Beta(7,4): 6 imaginary heads,

3 imaginary tails

Now flip 100 coins and get 58 heads and 42 tails.

38

mode: "9

mode: :9

prior

posterior

Posterior 1: Beta(61,50) mode: :/!/9

mode: :;!/9Posterior 2: Beta(65,46)

As long as we collect enough data,posteriors will converge to the true value.

Page 39: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

MAP with Laplace smoothing: a prior which represents C imagined observations of each outcome.

• Categorical data (i.e., Multinomial, Bernoulli/Binomial)• Also known as additive smoothing

Laplace estimate Imagine C = 1 of each outcome(follows from Laplace’s “law of succession”)

Example: Laplace estimate for coin probabilities from aforementionedexperiment (100 coins: 58 heads, 42 tails)

Laplace smoothing

39

59102

43102

heads tailsLaplace smoothing:• Easy to implement/remember

Page 40: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Consider our previous 6-sided die.• Roll the dice 6 = 12 times.• Observe: 3 ones, 2 twos, 0 threes, 3 fours, 1 fives, 3 sixes

Recall !*NO :

What are your Laplace estimates for each roll outcome?

Back to our happy Laplace

40

=! = 3/12, =" = 2/12, =< = 0/12,=; = 3/12, =1 = 1/12, =: = 3/12

!

Page 41: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Consider our previous 6-sided die.• Roll the dice 6 = 12 times.• Observe: 3 ones, 2 twos, 0 threes, 3 fours, 1 fives, 3 sixes

Recall !*NO :

What are your Laplace estimates for each roll outcome?

Back to our happy Laplace

41

=! = 3/12, =" = 2/12, =< = 0/12,=; = 3/12, =1 = 1/12, =: = 3/12

=* ="* + 16 +8

=! = 4/18, =" = 3/18, =< = 1/18,=; = 4/18, =1 = 2/18, =: = 4/18

Laplace smoothing:• Easy to implement/remember• Avoids estimating a parameter of S

Page 42: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Bayesian Envelope Demo

42

LIVE

Page 43: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Two envelopesTwo envelopes: One contains $#, the other contains $2#.• Select an envelope and open it.• Before opening the envelope, think either equally good.

Is the following reasoning valid?• Let T = $ in envelope you selected.• Let U = $ in other envelope.

Follow-up: What happened byopening the envelope?

43

E F|G =1

2⋅G

2+1

2⋅ 2G =

5

4G

!

Page 44: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Two envelopesTwo envelopes: One contains $#, the other contains $2#.• Select an envelope and open it.• Before opening the envelope, think either equally good.

Is the following reasoning valid?• Let T = $ in envelope you selected.• Let U = $ in other envelope.

Follow-up: What happened byopening the envelope?

44

E F|G =1

2⋅G

2+1

2⋅ 2G =

5

4G

• Assumes all values of " (where 0 < " < ∞) equally likely

• Infinitely many values of "• So not a true probability distribution over "

(does not integrate to 1)

Page 45: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Are all values equally likely?

45

0 10 20 10060 8040

Infinite powers of two…

Y"

"

Page 46: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Two envelopes: The subjectivity of probabilityYour belief about the content of envelopes:• Since implied distribution over # is not a true probability distribution,

what is our distribution over #?

46

Frequentist

Play game infinitely many times, see how often different values come up Problem: you can only play game once

Bayesian

Have prior belief of distribution of $• Prior belief is a subjective

probability (as are all probabilities)• Can answer questions when

no/limited data• As we get more data, prior belief

“swamped” by data

Page 47: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Two envelopes: The subjectivity of probability

47

0 10 20 10060 8040

Y"

"

Page 48: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

The envelope, pleaseBayesian: Have a prior distribution over #, J #• Let T = $ in envelope you selected. Open envelope to determine T.• Let U = $ in other envelope.

If G > E F|G , keep your envelope, otherwise switch. No inconsistency!!• Opening envelope provides data to compute J #|G

• …which allows you to compute E F|G

Of course, need to think about yourprior distribution over #…

Imagine if envelope you opened contained $20.01. Should you switch?

48

Bayesian probability: It doesn’t matter how you determine your prior, but you must have one (whatever it is)

Page 49: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

How much is a half cent?

49

Page 50: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020

Have a wonderful Wednesday!

50

Page 51: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020 51

Page 52: 22: MAPweb.stanford.edu/class/cs109/lectures/22_map.pdfLisa Yan, CS109, 2020 MAP in practice •Flip a coin 8 times. Observe 6=7heads and 8=1tail. •What is the MAP estimator of the

Lisa Yan, CS109, 2020 52