lecture 6 bootstraps maximum likelihood methods. boostrapping a way to generate empirical...

Lecture 6

Bootstraps

Maximum Likelihood Methods

Boostrapping

A way to generateempirical probability distributions

Very handy for makingestimates of uncertainty

100 realizations of a normal distribution p(y) with

y=50 y=100

What is the distribution of

yest = i yi

We know this should be a Normal distribution with

expectation=y=50and variance=y/N=10

p(yest)

Here’s an empirical way of determining the distribution

called

bootstrapping

i y’i

Compute estimate

Now repeat a gazillion times and examine the resulting distribution of estimates

Note that we are doing

random sampling with replacement

of the original dataset y

to create a new dataset y’

Note: the same datum, yi, may appear several times in the new dataset, y’

pot of an infinite number of y’s with

distribution p(y)

cup of N y’s drawn from

the pot

Does a cup drawn from the pot

capture the statistical behavior of what’s in the

More or less the same thing in the 2 pots ?

Take 1 cup

Pour into new pot

Random sampling easy to code in MatLab

yprime = y(unidrnd(N,N,1));

vector of N random integers between 1 and N

original dataresampled data

The theoretical and bootstrap results match pretty well !

theoretical

Bootstrap with 105 realizations

Obviouslybootstrapping is of limited utility when we know the theoretical

distribution

(as in the previous example)

but it can be very useful when we don’t

for example

what’s the distribution of yest

where (yest)2 = 1/(N-1) i (yi-yest)2

and yest= (1/N) i yi

(Yes, I know a statistician would know it follows Student’s T-distribution …)

To do the bootstrap

we calculate

y’est= (1/N) i y’i

(y’est)2 = 1/(N-1) i (y’i-y’est)2

and y’est = (y’

many times – say 105 times

Here’s the bootstrap result …

Bootstrap with 105 realizations

I numerically calculate an expected value of 92.8 and a variance of 6.2

Note that the distribution is not quite centered about the true value of 100

This is random variation. The original N=100 data are not quite representative of the an infinite ensemble of normally-distributed values

pyest)

So we would be justified saying

y 92.6 ± 12.4

that is, 26.2, the 95% confidence interval

The Maximum Likelihood Distribution

A way to fitparameterized probability distributions

to data

very handy when you have good reasonto believe the data follow a particular

distribution

Likelihood Function, L

The logarithm ofthe probable-ness of a given dataset

N data y are all drawn from the same distribution p(y)

the probable-ness of a single measurement yi is p(yi)

So the probable-ness of the whole dataset is

p(y1) p(y2) … p(yN) = i p(yi)

L = ln i p(yi) = i ln p(yi)

Now imagine that the distribution p(y) is known up to a vector m of unknown parameters

write p(y; m) with semicolon as a reminder

that its not a joint probabilty

The L is a function of m

L(m) = i ln p(yi; m)

The Principle of Maximum Likelihood

Chose m so that it maximizes L(m)

L/mi = 0

the dataset that was in fact observed is the most probable one that could have been observed

Example – normal distribution of unknown mean y and variance 2

p(yi) = (2)-1/2 -1 exp{ -½ -2 (yi-y)2 }

L = i ln p(yi) =

-½Nln(2) –Nln() -½ -2 i (yi-y)2

L/y = 0 = -2 i (yi-y)

L/ = 0 = - N -1 + -3 i (yi-y)2

N’s arise because sum is

from 1 to N

Solving for y and

0 = -2 i (yi-y) y = N-1 iyi

0 = -N-1 + -3 i (yi-y)2 2 = N-1 i (yi-y)2

y = N-1 iyi

2 = N-1 i (yi-y)2

Sample mean is the maximum likelihood estimate of the expected value of the normal distribution

Sample variance (more-or-less*) is the maximum likelihood estimate of the variance of the normal distribution

*issue of N vs. N-1 in the formula

Interpreting the results

Example – 100 data drawn from a normal distribution

truey=50=100

y=62=107

Another Example – exponential distribution

p(yi) = ½ -1 exp{ - -1 |yi-y| }

Check normalization … use z= yi-y

p(yi)dy = ½-1 -+

exp{ - -1 |yi-y| } dyi

= ½ -1 2 0

+ exp{ - -1 z } dz

= -1 (-) exp{--1z}|0+ = 1

Is this parameter really the expectation ?

Is this parameter really variance ?

Is y the expectation ?

E(yi) = -+

yi ½ -1 exp{ - -1 |yi-y| } dyi

use z= yi-y

E(yi) = ½ -1 -+

(z+y) exp{ - -1|z| } dz

= ½ -1 2 y o

+exp{ - -1 z } dz

= - y exp{ - -1 z }|o+

z exp(--1|z|) is odd function times even function so integral is zero

Is the variance ?

var(yi) = -+

(yi-y)2 ½ -1 exp{ - -1 |yi-y| } dyi

use z= -1(yi-y)

E(yi) = ½ -1 -+ 2 z2 exp{ -|z| } dz

+ z2 exp{ -z } dz

= 2 2 2

CRC Math Handbook gives this integral as equal to 2

Not Quite …

Maximum likelihood estimate

L = Nln(½) – Nln() - -1 i |yi-y|

L/y = 0 = - -1 i sgn (yi-y)

L/ = 0 = - N -1 + -2 i |yi-y|

y such that i sgn (yi-y) = 0

d|x|/dx

Zero when half the yi’s bigger than y, half of them smallery is the median of the yi’s

Once y is known then …

L/ = 0 = - N -1 + -2 i |yi-y|

= N-1 i |yi-y| with y = median(y)

Note that when N is even, y is not unique,

but can be anything between the two middle values in a sorted list of yi’s

Comparison

Normal distribution:

best estimate of expected value is sample mean

Exponential distribution

best estimate of expected value is sample median

ComparisonNormal distribution:

short tailedoutlier extremely uncommonexpected value should be chosen to make

outliers have as small a deviation as possible

Exponential distribution:relatively long-tailedoutlier relatively commonexpected value should ignore actual value of outliers

median mean

outlier

median mean

another important distributionGutenberg-Richter distribution

(e.g. earthquake magnitudes)

for earthquakes greater than some threshhold magnitude m0, the probability that the earthquake will have a magnitude greater than m is

–b (m-m0)

or P(m) = exp{ – log(10) b (m-m0) }

= exp{-b’ (m-m0) } with b’= log(10) b

P(m)=10

This is a cumulative distribution, thus the probability that magnitude is greater than m0 is unity

P(m) = exp{ –b’ (m-m0) } = exp{0} = 1

Probability density distribution is its derivative

p(m) = b’ exp { –b’ (m-m0) }

Maximum likelihood estimate of b’ is

L(m) = N log(b’) – b’ i (mi-m0)

L/b’ = 0 = N/b’ - i (mi-m0)

b’ = N / i (mi-m0)

Originally Gutenberg & Richtermade a mistake …

magnitude, m

slope = -b

… by estimating slope, b using least-squares, and not the Maximum Likelihood formula

least-squares fit

yet another important distributionFisher distribution on a sphere

(e.g. paleomagnetic directions)

given unit vectors xi that scatter around some mean direction x, the probability distribution for the angle between xi and x (that is, cos()=xix) is

p() = sin() exp{ cos() }

2 sinh() is called the “precision parameter”

Rationale for functional form

p() exp{ cos() }

For close to zero 1 – ½2 so

p() exp{ cos() } = exp{ exp{ – ½2 }

which is a gaussian

I’ll let you figure out the

maximum likelihood estimate of

the central direction, x,

and the precision parameter,

lecture 6 bootstraps maximum likelihood methods. boostrapping a way to generate empirical...

Documents

bootstraps and scrambles: letting data speak for themselves

bootstraps, permutation tests, and cross-validation

“bootstraps uplift”: messages and mythologies of the new...

early inference: using bootstraps to introduce confidence...

bootstraps application and fafsa workshop january 10, 2015

bootstraps and scrambles: letting a dataset speak for itself

"by the bootstraps" lecture, 2009

putting on your design bootstraps: intro to responsive...

statistical science bootstraps for time series

technology for teaching bootstraps and...

bootstraps - goodwilljax.org · bootstraps. a publication...

by his bootstraps by robert a. heinlein - xs4all his...

miscmisc pull yourself up by the bootstraps to accomplish...

bootstraps, permutation tests, and sampling orders of .......

chapter 2 – the basic bootstraps - chrisbilder.com · web...

boostrapping to $1m in annual revenue. the slow growth...

environmental data analysis with matlab lecture 24:...

j.d. opdyke - much faster bootstraps using sas - 09-14-10 -...

boostrapping and fundraising - founder's institute

boostrapping cointegration regression analysis