19lecture - central limit theorem

17
Central Limit Theorem MA700 Statistical Methods for Researchers Dr. Sriram Devanathan Amrita Vishwa Vidyapeetham 1

Upload: saidasa

Post on 02-May-2017

226 views

Category:

Documents


1 download

TRANSCRIPT

Central Limit Theorem

MA700 Statistical Methods for Researchers

Dr. Sriram Devanathan

Amrita Vishwa Vidyapeetham

1

XX

Central Limit Theorem

As Sample SizeGets Large Enough

Sampling Distribution of

becomes

almost Normal

regardlessof shape of population

X

Central Limit Theorem:When sampling from almost any distribution,

is approximately Normally distributed in large samples.

X

Central Limit Theorem

As the sample size increases the sampling

distribution of the sample mean approaches the

normal distribution with mean and variance

2/n

~X ),(Nn

2

Normal Population Distribution

Let X1,…, Xn be a random sample from a normal

distribution with mean value and standard deviation

Then for any n, is normally distributed.

The Central Limit Theorem

Let X1,…, Xn be a random sample from a distribution with

mean value and variance Then if n sufficiently

large, has approximately a normal distribution with

The larger the value of n, the better the approximation.

.X

2.X

22 and ,X X n

Formal Statement

6

The Central Limit Theorem

large

n

X

small to

moderate n

X

Population

distribution

Rule of Thumb

If n > 30, the Central Limit Theorem can be used.

0 1 2 3 4

A warning!

Not all distributions have finite mean and variance

For example, neither the Cauchy distribution (the

ratio of two standard normal random variables) nor

the distribution of the ratio of two iid exponentially

distributed random variables have any moments!

For such distributions, the CLT does not hold.

-10 -5 0 5 10

Cauchy

21

11)(

xxf

21

1)(

xxf

The Normal distribution

Chest measurements of 5738 Scottish

soldiers by Belgian scholar Lambert Quetelet

(1796-1874)

First application of the Normal distribution to

human data

4035 450.0

0.1

0.2

150 160 170 180 190 200.00

.02

.04

.06

(a) Chest measurements of Quetelet’s Scottish soldiers (in.)

(b) Heights of the 4294 men in the workforce database (cm)

= 39.8 in., = 2.05 in.

= 174 cm, = 6.57 cm

Normal density curve has

Normal density curve has

Figure 6.2.1 Two standardized histograms with approximating Normal densitycurves.

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

The sample mean has a sampling

distributionSampling batches of Scottish soldiers and taking chest

measurements. Pop mean = 39.8 in, Pop s.d. = 2.05 in

1

2

3

4

5

6

7

8

9

10

12

11

34 36 38 40 42 44 46

(a) 12 samples of size n = 6Samplenumber

Chest measurement (in.)

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999.

Twelve samples of size 24

34 36 38 40 42 44 46

(b) 12 samples of size n = 24Samplenumber

Chest measurement (in.)

1

2

3

4

5

6

7

8

9

10

12

11

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Histograms from 100,000 samples

(c) n = 100

(b) n = 24

393837 40 41 42

393837 40 41 42

393837 40 41 42

0.0

0.5

1.0

1.5

0.0

0.5

1.0

0.0

0.5

Sample mean of chest measurements (in.)

(a) n = 6

Figure 7.2.2 Standardised histograms of the sample means from 100,000 samples of soldiers (n soldiers per sample).

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Central Limit Effect -- Histograms of sample means

n = 2n = 1

n = 4 n = 10

00.0 0.2 0.4 0.6 0.8 1.0

1

2

3

0.0 0.2 0.4 0.6 0.8 1.00

1

2

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

4

0.0 0.2 0.4 0.6 0.8 1.00

1

2

(b) Uniform

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Central Limit Effect -- Histograms of sample means

n = 2n = 1

n = 4 n = 10

0 1 2 3 4 5 60.0

0.2

0.4

0.6

0.8

1.0

0 1 20.0

0.4

0.8

1.2

0 1 2 3 4 5 60.0

0.2

0.4

0.6

0.8

1.0

(a) Exponential

0 1 2 30.0

0.2

0.4

0.6

0.8

1.0

0 1 2 3 40.0

0.2

0.4

0.6

0.8

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Central Limit Effect -- Histograms of sample means

n = 2n = 1

n = 4 n = 10

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

0.0 0.2 0.4 0.6 0.8 1.00

1

2

3

(b) Quadratic U

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Consequences of the CLT

When asking questions about the mean(s) of distributions, we can use theory based on the Normal distribution

Is the mean different from zero?

Are the means different from each other?

Traits that are made up of the sum of many parts are likely to follow a Normal distribution

True even for mixture distributions

Distributions related to the Normal distribution are widely relevant to statistical analyses

c2 distribution

t-distribution

F-distribution