Lecture 6
Bootstraps
Maximum Likelihood Methods
Boostrapping
A way to generateempirical probability distributions
Very handy for makingestimates of uncertainty
100 realizations of a normal distribution p(y) with
y=50 y=100
What is the distribution of
yest = i yi
?
N1
We know this should be a Normal distribution with
expectation=y=50and variance=y/N=10
p(y)
y
p(yest)
yest
Here’s an empirical way of determining the distribution
called
bootstrapping
y1
y2
y3
y4
y5
y6
y7
…
yN
y’1
y’2
y’3
y’4
y’5
y’6
y’7
…
y’N
4
3
7
11
4
1
9
…
6
N o
rigi
nal d
ata
Ran
dom
inte
gers
in
the
rang
e 1-
N
N r
esam
pled
dat
aN1
i y’i
Compute estimate
Now repeat a gazillion times and examine the resulting distribution of estimates
Note that we are doing
random sampling with replacement
of the original dataset y
to create a new dataset y’
Note: the same datum, yi, may appear several times in the new dataset, y’
pot of an infinite number of y’s with
distribution p(y)
cup of N y’s drawn from
the pot
Does a cup drawn from the pot
capture the statistical behavior of what’s in the
pot?
More or less the same thing in the 2 pots ?
Take 1 cup
p(y)D
uplic
ate
cup
an in
fini
te
num
ber
of ti
mes
Pour into new pot
p(y)
Random sampling easy to code in MatLab
yprime = y(unidrnd(N,N,1));
vector of N random integers between 1 and N
original dataresampled data
The theoretical and bootstrap results match pretty well !
theoretical
Bootstrap with 105 realizations
Obviouslybootstrapping is of limited utility when we know the theoretical
distribution
(as in the previous example)
but it can be very useful when we don’t
for example
what’s the distribution of yest
where (yest)2 = 1/(N-1) i (yi-yest)2
and yest= (1/N) i yi
(Yes, I know a statistician would know it follows Student’s T-distribution …)
To do the bootstrap
we calculate
y’est= (1/N) i y’i
(y’est)2 = 1/(N-1) i (y’i-y’est)2
and y’est = (y’
est)2
many times – say 105 times
Here’s the bootstrap result …
Bootstrap with 105 realizations
ytrue
I numerically calculate an expected value of 92.8 and a variance of 6.2
Note that the distribution is not quite centered about the true value of 100
This is random variation. The original N=100 data are not quite representative of the an infinite ensemble of normally-distributed values
pyest)
yest
So we would be justified saying
y 92.6 ± 12.4
that is, 26.2, the 95% confidence interval
The Maximum Likelihood Distribution
A way to fitparameterized probability distributions
to data
very handy when you have good reasonto believe the data follow a particular
distribution
Likelihood Function, L
The logarithm ofthe probable-ness of a given dataset
N data y are all drawn from the same distribution p(y)
the probable-ness of a single measurement yi is p(yi)
So the probable-ness of the whole dataset is
p(y1) p(y2) … p(yN) = i p(yi)
L = ln i p(yi) = i ln p(yi)
Now imagine that the distribution p(y) is known up to a vector m of unknown parameters
write p(y; m) with semicolon as a reminder
that its not a joint probabilty
The L is a function of m
L(m) = i ln p(yi; m)
The Principle of Maximum Likelihood
Chose m so that it maximizes L(m)
L/mi = 0
the dataset that was in fact observed is the most probable one that could have been observed
Example – normal distribution of unknown mean y and variance 2
p(yi) = (2)-1/2 -1 exp{ -½ -2 (yi-y)2 }
L = i ln p(yi) =
-½Nln(2) –Nln() -½ -2 i (yi-y)2
L/y = 0 = -2 i (yi-y)
L/ = 0 = - N -1 + -3 i (yi-y)2
N’s arise because sum is
from 1 to N
Solving for y and
0 = -2 i (yi-y) y = N-1 iyi
0 = -N-1 + -3 i (yi-y)2 2 = N-1 i (yi-y)2
y = N-1 iyi
2 = N-1 i (yi-y)2
Sample mean is the maximum likelihood estimate of the expected value of the normal distribution
Sample variance (more-or-less*) is the maximum likelihood estimate of the variance of the normal distribution
*issue of N vs. N-1 in the formula
Interpreting the results
Example – 100 data drawn from a normal distribution
truey=50=100
L(y,)
y
maxat
y=62=107
Another Example – exponential distribution
p(yi) = ½ -1 exp{ - -1 |yi-y| }
Check normalization … use z= yi-y
p(yi)dy = ½-1 -+
exp{ - -1 |yi-y| } dyi
= ½ -1 2 0
+ exp{ - -1 z } dz
= -1 (-) exp{--1z}|0+ = 1
Is this parameter really the expectation ?
Is this parameter really variance ?
Is y the expectation ?
E(yi) = -+
yi ½ -1 exp{ - -1 |yi-y| } dyi
use z= yi-y
E(yi) = ½ -1 -+
(z+y) exp{ - -1|z| } dz
= ½ -1 2 y o
+exp{ - -1 z } dz
= - y exp{ - -1 z }|o+
= y
z exp(--1|z|) is odd function times even function so integral is zero
YES !
Is the variance ?
var(yi) = -+
(yi-y)2 ½ -1 exp{ - -1 |yi-y| } dyi
use z= -1(yi-y)
E(yi) = ½ -1 -+ 2 z2 exp{ -|z| } dz
= 2 0
+ z2 exp{ -z } dz
= 2 2 2
CRC Math Handbook gives this integral as equal to 2
Not Quite …
Maximum likelihood estimate
L = Nln(½) – Nln() - -1 i |yi-y|
L/y = 0 = - -1 i sgn (yi-y)
L/ = 0 = - N -1 + -2 i |yi-y|
y such that i sgn (yi-y) = 0
x
|x|
x
d|x|/dx
+1
-1
Zero when half the yi’s bigger than y, half of them smallery is the median of the yi’s
Once y is known then …
L/ = 0 = - N -1 + -2 i |yi-y|
= N-1 i |yi-y| with y = median(y)
Note that when N is even, y is not unique,
but can be anything between the two middle values in a sorted list of yi’s
Comparison
Normal distribution:
best estimate of expected value is sample mean
Exponential distribution
best estimate of expected value is sample median
ComparisonNormal distribution:
short tailedoutlier extremely uncommonexpected value should be chosen to make
outliers have as small a deviation as possible
Exponential distribution:relatively long-tailedoutlier relatively commonexpected value should ignore actual value of outliers
yi
median mean
outlier
yi
median mean
another important distributionGutenberg-Richter distribution
(e.g. earthquake magnitudes)
for earthquakes greater than some threshhold magnitude m0, the probability that the earthquake will have a magnitude greater than m is
–b (m-m0)
or P(m) = exp{ – log(10) b (m-m0) }
= exp{-b’ (m-m0) } with b’= log(10) b
P(m)=10
This is a cumulative distribution, thus the probability that magnitude is greater than m0 is unity
P(m) = exp{ –b’ (m-m0) } = exp{0} = 1
Probability density distribution is its derivative
p(m) = b’ exp { –b’ (m-m0) }
Maximum likelihood estimate of b’ is
L(m) = N log(b’) – b’ i (mi-m0)
L/b’ = 0 = N/b’ - i (mi-m0)
b’ = N / i (mi-m0)
Originally Gutenberg & Richtermade a mistake …
magnitude, m
Log
10 P
(m)
slope = -b
… by estimating slope, b using least-squares, and not the Maximum Likelihood formula
least-squares fit
yet another important distributionFisher distribution on a sphere
(e.g. paleomagnetic directions)
given unit vectors xi that scatter around some mean direction x, the probability distribution for the angle between xi and x (that is, cos()=xix) is
p() = sin() exp{ cos() }
2 sinh() is called the “precision parameter”
Rationale for functional form
p() exp{ cos() }
For close to zero 1 – ½2 so
p() exp{ cos() } = exp{ exp{ – ½2 }
which is a gaussian
I’ll let you figure out the
maximum likelihood estimate of
the central direction, x,
and the precision parameter,