lecture 6 bootstraps maximum likelihood methods. boostrapping a way to generate empirical...

Post on 14-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 6

Bootstraps

Maximum Likelihood Methods

Boostrapping

A way to generateempirical probability distributions

Very handy for makingestimates of uncertainty

100 realizations of a normal distribution p(y) with

y=50 y=100

What is the distribution of

yest = i yi

?

N1

We know this should be a Normal distribution with

expectation=y=50and variance=y/N=10

p(y)

y

p(yest)

yest

Here’s an empirical way of determining the distribution

called

bootstrapping

y1

y2

y3

y4

y5

y6

y7

yN

y’1

y’2

y’3

y’4

y’5

y’6

y’7

y’N

4

3

7

11

4

1

9

6

N o

rigi

nal d

ata

Ran

dom

inte

gers

in

the

rang

e 1-

N

N r

esam

pled

dat

aN1

i y’i

Compute estimate

Now repeat a gazillion times and examine the resulting distribution of estimates

Note that we are doing

random sampling with replacement

of the original dataset y

to create a new dataset y’

Note: the same datum, yi, may appear several times in the new dataset, y’

pot of an infinite number of y’s with

distribution p(y)

cup of N y’s drawn from

the pot

Does a cup drawn from the pot

capture the statistical behavior of what’s in the

pot?

More or less the same thing in the 2 pots ?

Take 1 cup

p(y)D

uplic

ate

cup

an in

fini

te

num

ber

of ti

mes

Pour into new pot

p(y)

Random sampling easy to code in MatLab

yprime = y(unidrnd(N,N,1));

vector of N random integers between 1 and N

original dataresampled data

The theoretical and bootstrap results match pretty well !

theoretical

Bootstrap with 105 realizations

Obviouslybootstrapping is of limited utility when we know the theoretical

distribution

(as in the previous example)

but it can be very useful when we don’t

for example

what’s the distribution of yest

where (yest)2 = 1/(N-1) i (yi-yest)2

and yest= (1/N) i yi

(Yes, I know a statistician would know it follows Student’s T-distribution …)

To do the bootstrap

we calculate

y’est= (1/N) i y’i

(y’est)2 = 1/(N-1) i (y’i-y’est)2

and y’est = (y’

est)2

many times – say 105 times

Here’s the bootstrap result …

Bootstrap with 105 realizations

ytrue

I numerically calculate an expected value of 92.8 and a variance of 6.2

Note that the distribution is not quite centered about the true value of 100

This is random variation. The original N=100 data are not quite representative of the an infinite ensemble of normally-distributed values

pyest)

yest

So we would be justified saying

y 92.6 ± 12.4

that is, 26.2, the 95% confidence interval

The Maximum Likelihood Distribution

A way to fitparameterized probability distributions

to data

very handy when you have good reasonto believe the data follow a particular

distribution

Likelihood Function, L

The logarithm ofthe probable-ness of a given dataset

N data y are all drawn from the same distribution p(y)

the probable-ness of a single measurement yi is p(yi)

So the probable-ness of the whole dataset is

p(y1) p(y2) … p(yN) = i p(yi)

L = ln i p(yi) = i ln p(yi)

Now imagine that the distribution p(y) is known up to a vector m of unknown parameters

write p(y; m) with semicolon as a reminder

that its not a joint probabilty

The L is a function of m

L(m) = i ln p(yi; m)

The Principle of Maximum Likelihood

Chose m so that it maximizes L(m)

L/mi = 0

the dataset that was in fact observed is the most probable one that could have been observed

Example – normal distribution of unknown mean y and variance 2

p(yi) = (2)-1/2 -1 exp{ -½ -2 (yi-y)2 }

L = i ln p(yi) =

-½Nln(2) –Nln() -½ -2 i (yi-y)2

L/y = 0 = -2 i (yi-y)

L/ = 0 = - N -1 + -3 i (yi-y)2

N’s arise because sum is

from 1 to N

Solving for y and

0 = -2 i (yi-y) y = N-1 iyi

0 = -N-1 + -3 i (yi-y)2 2 = N-1 i (yi-y)2

y = N-1 iyi

2 = N-1 i (yi-y)2

Sample mean is the maximum likelihood estimate of the expected value of the normal distribution

Sample variance (more-or-less*) is the maximum likelihood estimate of the variance of the normal distribution

*issue of N vs. N-1 in the formula

Interpreting the results

Example – 100 data drawn from a normal distribution

truey=50=100

L(y,)

y

maxat

y=62=107

Another Example – exponential distribution

p(yi) = ½ -1 exp{ - -1 |yi-y| }

Check normalization … use z= yi-y

p(yi)dy = ½-1 -+

exp{ - -1 |yi-y| } dyi

= ½ -1 2 0

+ exp{ - -1 z } dz

= -1 (-) exp{--1z}|0+ = 1

Is this parameter really the expectation ?

Is this parameter really variance ?

Is y the expectation ?

E(yi) = -+

yi ½ -1 exp{ - -1 |yi-y| } dyi

use z= yi-y

E(yi) = ½ -1 -+

(z+y) exp{ - -1|z| } dz

= ½ -1 2 y o

+exp{ - -1 z } dz

= - y exp{ - -1 z }|o+

= y

z exp(--1|z|) is odd function times even function so integral is zero

YES !

Is the variance ?

var(yi) = -+

(yi-y)2 ½ -1 exp{ - -1 |yi-y| } dyi

use z= -1(yi-y)

E(yi) = ½ -1 -+ 2 z2 exp{ -|z| } dz

= 2 0

+ z2 exp{ -z } dz

= 2 2 2

CRC Math Handbook gives this integral as equal to 2

Not Quite …

Maximum likelihood estimate

L = Nln(½) – Nln() - -1 i |yi-y|

L/y = 0 = - -1 i sgn (yi-y)

L/ = 0 = - N -1 + -2 i |yi-y|

y such that i sgn (yi-y) = 0

x

|x|

x

d|x|/dx

+1

-1

Zero when half the yi’s bigger than y, half of them smallery is the median of the yi’s

Once y is known then …

L/ = 0 = - N -1 + -2 i |yi-y|

= N-1 i |yi-y| with y = median(y)

Note that when N is even, y is not unique,

but can be anything between the two middle values in a sorted list of yi’s

Comparison

Normal distribution:

best estimate of expected value is sample mean

Exponential distribution

best estimate of expected value is sample median

ComparisonNormal distribution:

short tailedoutlier extremely uncommonexpected value should be chosen to make

outliers have as small a deviation as possible

Exponential distribution:relatively long-tailedoutlier relatively commonexpected value should ignore actual value of outliers

yi

median mean

outlier

yi

median mean

another important distributionGutenberg-Richter distribution

(e.g. earthquake magnitudes)

for earthquakes greater than some threshhold magnitude m0, the probability that the earthquake will have a magnitude greater than m is

–b (m-m0)

or P(m) = exp{ – log(10) b (m-m0) }

= exp{-b’ (m-m0) } with b’= log(10) b

P(m)=10

This is a cumulative distribution, thus the probability that magnitude is greater than m0 is unity

P(m) = exp{ –b’ (m-m0) } = exp{0} = 1

Probability density distribution is its derivative

p(m) = b’ exp { –b’ (m-m0) }

Maximum likelihood estimate of b’ is

L(m) = N log(b’) – b’ i (mi-m0)

L/b’ = 0 = N/b’ - i (mi-m0)

b’ = N / i (mi-m0)

Originally Gutenberg & Richtermade a mistake …

magnitude, m

Log

10 P

(m)

slope = -b

… by estimating slope, b using least-squares, and not the Maximum Likelihood formula

least-squares fit

yet another important distributionFisher distribution on a sphere

(e.g. paleomagnetic directions)

given unit vectors xi that scatter around some mean direction x, the probability distribution for the angle between xi and x (that is, cos()=xix) is

p() = sin() exp{ cos() }

2 sinh() is called the “precision parameter”

Rationale for functional form

p() exp{ cos() }

For close to zero 1 – ½2 so

p() exp{ cos() } = exp{ exp{ – ½2 }

which is a gaussian

I’ll let you figure out the

maximum likelihood estimate of

the central direction, x,

and the precision parameter,

top related