point estimation - university of washington · point estimation cse 446: machine learning emily fox...

11
1/6/2017 1 Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox Maximum likelihood estimation for a binomial distribution ©2017 Emily Fox

Upload: others

Post on 10-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

1

CSE 446: Machine Learning

Point EstimationCSE 446: Machine LearningEmily FoxUniversity of WashingtonJanuary 6, 2017

©2017 Emily Fox

CSE 446: Machine Learning

Maximum likelihood estimationfor a binomial distribution

©2017 Emily Fox

Page 2: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

2

CSE 446: Machine Learning3

Your first consulting job

• A bored Seattle billionaire asks you a question:- He says: I have thumbtack, if I flip it, what’s the probability it

will fall with the nail up?- You say: Please flip it a few times:

- You say: The probability is:

- He says: Why???- You say: Because…

©2017 Emily Fox

CSE 446: Machine Learning4

Thumbtack – Binomial distribution

• P(Heads) = θ, P(Tails) = 1-θ• Flips are i.i.d.:

- Independent events

- Identically distributed according to a binomial distribution

• Sequence D of αH heads (H) and αT tails (T)

• P(D | θ) =

©2017 Emily Fox

Page 3: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

3

CSE 446: Machine Learning5

The learning task

• Want to learn a model of thumbtack flips from experience

• Example 1: Maximum likelihood estimationWhat value of θ maximizes the likelihood of having seen the observed sequence (according to my model)?

• What is a likelihood function?

©2017 Emily Fox

CSE 446: Machine Learning6

Maximum likelihood estimation

• Data: Observed set D of αH heads (H) and αT tails (T)

• Hypothesis: Binomial distribution

• Learning θ is an optimization problem- What’s the objective function?

• MLE: Choose θ that maximizes the likelihood of observed data

©2017 Emily Fox

Page 4: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

4

CSE 446: Machine Learning7

Your first learning algorithm

• Set derivative to zero:

©2017 Emily Fox

CSE 446: Machine Learning8

How many flips do I need?

• Billionaire says: I flipped 3 heads and 2 tails.• You say: = 3/5, I can prove it!• He says: What if I flipped 30 heads and 20 tails?• You say: Same answer, I can prove it!

• He says: What’s better?• You say: Humm… The more the merrier???• He says: Is this why I am paying you the big bucks???

©2017 Emily Fox

Page 5: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

5

CSE 446: Machine Learning9

Simple bound (based on Hoeffding’s Inequality)

• For N = αH + αT and

• Let θ* be the true parameter. For any ε>0:

©2017 Emily Fox

CSE 446: Machine Learning10

PAC learning

• PAC: Probably Approximate Correct

• Billionaire says: I want to know the thumbtack parameter θ within ε = 0.1, with probability at least 1- = 0.95. How many flips do I need?

©2017 Emily Fox

Page 6: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

6

CSE 446: Machine Learning

What about continuous-valued data?

©2017 Emily Fox

CSE 446: Machine Learning12

What about continuous variables?

• Billionaire says: If I am measuring a continuous variable, what can you do for me?

• You say: Let me tell you about Gaussians…

©2017 Emily Fox

Page 7: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

7

CSE 446: Machine Learning13

1D Gaussians

©2016 Emily Fox & Carlos Guestrin

µ

Fully specified by mean µ and variance σ2

(or standard deviation σ)

x

Random variable the distribution is overe.g., height

CSE 446: Machine Learning14

1D Gaussian probability density function

©2016 Emily Fox & Carlos Guestrin

µ

2σN(µ, σ2)

x

Random variable the distribution is overe.g., height

parameters

Page 8: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

8

CSE 446: Machine Learning15

Some properties of Gaussians

• Affine transformation (multiplying by scalar and adding a constant)- X ~ N(µ,σ2)- Y = aX + b Y ~ N(aµ+b,a2σ2)

• Sum of Gaussians- X ~ N(µX,σ2

X)- Y ~ N(µY,σ2

Y)- Z = X+Y Z ~ N(µX+µY, σ2

X+σ2Y)

©2017 Emily Fox

CSE 446: Machine Learning16

Learning a Gaussian• Collect a bunch of data

- Hopefully, i.i.d. samples

- e.g., heights of students in class

• Learn parameters- Mean

- Variance

©2017 Emily Fox

Page 9: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

9

CSE 446: Machine Learning17

MLE for Gaussian

• Prob. of i.i.d. samples D={x1,…,xN}:

• Log-likelihood of data:

©2017 Emily Fox

CSE 446: Machine Learning18

Your second learning algorithm:MLE for mean of a Gaussian

• What’s MLE for the mean?

©2017 Emily Fox

Page 10: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

10

CSE 446: Machine Learning19

MLE for variance

• Again, set derivative to zero:

©2017 Emily Fox

CSE 446: Machine Learning20

Learning Gaussian parameters

• MLE:

• FYI, MLE for the variance of a Gaussian is biased- Expected value of estimator is not true parameter!

- Unbiased variance estimator:

©2017 Emily Fox

Page 11: Point Estimation - University of Washington · Point Estimation CSE 446: Machine Learning Emily Fox University of Washington January 6, 2017 ©2017 Emily Fox CSE 446: Machine Learning

1/6/2017

11

CSE 446: Machine Learning

Recap of concepts

©2017 Emily Fox

CSE 446: Machine Learning22

What you need to know…• Learning is…

- Collect some data• E.g., thumbtack flips

- Choose a hypothesis class or model• E.g., binomial

- Choose a loss function• E.g., data likelihood

- Choose an optimization procedure• E.g., set derivative to zero to obtain MLE

- Collect the big bucks

• Like everything in life, there is a lot more to learn…- Many more facets… Many more nuances…

- The fun will continue…

©2017 Emily Fox