ceentral limit theorem2

Central limit theorem Proving the central limit theorem

18.440: Lecture 31

Central limit theorem

Scott Sheeld

MIT

18.440 Lecture 31


Outline


Proving the central limit theorem

18.440 Lecture 31


Recall: DeMoivre-Laplace limit theorem

Let Xi be an i.i.d. sequence of random variables. Write Sn = i

n =1 Xn.

Suppose each Xi is 1 with probability p and 0 with probability q = 1 p.

DeMoivre-Laplace limit theorem:

Sn nplim P{a

npq b} (b) (a).

n

Here (b) (a) = P{a Z b} when Z is a standard normal random variable. Snnp npq describes number of standard deviations that Sn is

above or below its mean. Question: Does a similar statement hold if the Xi are i.i.d. but have some other probability distribution?

Central limit theorem: Yes, if they have nite variance. 18.440 Lecture 31


Example

Say we roll 106 ordinary dice independently of each other. 106 Let Xi be the number on the ith die. Let X = i=1 Xi be the total of the numbers rolled.

What is E [X ]?

What is Var[X ]?

How about SD[X ]?

What is the probability that X is less than a standard deviations above its mean?

1 a Central limit theorem: should be about 2

e

x2/2dx .

18.440 Lecture 31


Example

Suppose earthquakes in some region are a Poisson point process with rate equal to 1 per year.

Let X be the number of earthquakes that occur over a ten-thousand year period. Should be a Poisson random variable with rate 10000.

What is E [X ]?

What is Var[X ]?

How about SD[X ]?

What is the probability that X is less than a standard deviations above its mean?

1 a Central limit theorem: should be about ex2/2dx .2

18.440 Lecture 31


General statement

Let Xi be an i.i.d. sequence of random variables with nite mean and variance 2 .

Write Sn = ni=1 Xi . So E [Sn] = n and Var[Sn] = n

2 and SD[Sn] =

n.

X1+X2+...+Xnn Write Bn = n . Then Bn is the dierence between Sn and its expectation, measured in standard

deviation units.

Central limit theorem:

lim P{a Bn b} (b) (a). n

18.440 Lecture 31


Outline


Proving the central limit theorem

18.440 Lecture 31


Recall: characteristic functions

Let X be a random variable.

The characteristic function of X is dened by (t) = X (t) := E [e

itX ]. Like M(t) except with i thrown in.

Recall that by denition e it = cos(t) + i sin(t).

Characteristic functions are similar to moment generating functions in some ways.

For example, X +Y = X Y , just as MX +Y = MX MY , if X and Y are independent.

And aX (t) = X (at) just as MaX (t) = MX (at).

And if X has an mth moment then E [X m] = imX (m)

(0).

Characteristic functions are well dened at all t for all random variables X .

18.440 Lecture 31


Rephrasing the theorem

Let X be a random variable and Xn a sequence of random variables.

Say Xn converge in distribution or converge in law to X if limn FXn (x) = FX (x) at all x R at which FX is continuous.

Recall: the weak law of large numbers can be rephrased as the statement that An =

X1+X2+n ...+Xn converges in law to (i.e.,

to the random variable that is equal to with probability one) as n .

The central limit theorem can be rephrased as the statement that Bn =

X1+X2+...+Xnn converges in law to a standard n

normal random variable as n .

18.440 Lecture 31


Continuity theorems

Levys continuity theorem (see Wikipedia): if

lim Xn (t) = X (t)n

for all t, then Xn converge in law to X .

By this theorem, we can prove the central limit theorem by showing limn Bn (t) = et

2/2 for all t.

Moment generating function continuity theorem: if moment generating functions MXn (t) are dened for all t and n and limn MXn (t) = MX (t) for all t, then Xn converge in law to X .

By this theorem, we can prove the central limit theorem by showing limn MBn (t) = et

2/2 for all t.

18.440 Lecture 31


Proof of central limit theorem with moment generating functions

Write Y = X . Then Y has mean zero and variance 1. Write MY (t) = E [etY ] and g(t) = log MY (t). So MY (t) = e

g(t). We know g(0) = 0. Also MY

(0) = E [Y ] = 0 and MY (0) = E [Y 2] = Var[Y ] = 1.

(0) gChain rule: M e =(0) = (0) (0) = 0 and g gY

t

MY (0) = g (0)eg (0) + g (0)2eg(0) = g (0) = 1.

So g is a nice function with g(0) = g (0) = 0 and g (0) = 1. Taylor expansion: g(t) = t2/2 + o(t2) for t near zero.

Now Bn is 1n times the sum of n independent copies of Y .

MY (t/n)

t

ng ( )n So MBn (t) = t

= e n .

)2/2ng( ) n( t2/2, in sense that LHS tends to But e e

as n tends to innity.

n n = e

et

2/2

18.440 Lecture 31


Proof of central limit theorem with characteristic functions

Moment generating function proof only applies if the moment generating function of X exists.

But the proof can be repeated almost verbatim using characteristic functions instead of moment generating functions.

Then it applies for any X with nite variance.

18.440 Lecture 31


Almost verbatim: replace MY (t) with Y (t)

Write Y (t) = E [e itY ] and g(t) = log Y (t). So Y (t) = e

g(t).

We know g(0) = 0. Also Y (0) = iE [Y ] = 0 and Y (0) = i

2E [Y 2] = Var[Y ] = 1. Chain rule: Y (0) = g

(0)eg (0) = g (0) = 0 and (0) = g (0)eg(0) + g (0)2eg(0) = g (0) = 1.Y

So is a nice function with (0) = (0) = 0 and g g g

g (0) = 1. Taylor expansion: g(t) = t2/2 + o(t2) for t near zero.

Now Bn is 1n times the sum of n independent copies of Y . t

Y (t/n)

ng( )n So Bn (t) = n= e .

t t) en( as n tends to innity.

)2/2ng( = et2/2, in sense that LHS tends But e n n

to et2/2

18.440 Lecture 31


Perspective

The central limit theorem is actually fairly robust. Variants of the theorem still apply if you allow the Xi not to be identically distributed, or not to be completely independent.

We wont formulate these variants precisely in this course. But, roughly speaking, if you have a lot of little random terms that are mostly independent and no single term contributes more than a small fraction of the total sum then the total sum should be approximately normal.

Example: if height is determined by lots of little mostly independent factors, then peoples heights should be normally distributed.

Not quite true... certain factors by themselves can cause a person to be a whole lot shorter or taller. Also, individual factors not really independent of each other.

Kind of true for homogenous population, ignoring outliers. 18.440 Lecture 31

MIT OpenCourseWare

MIT OpenCourseWare http://ocw.mit.edu

18.440 Probability and Random Variables Spring 2011

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.


ceentral limit theorem2

Documents