19lecture - central limit theorem
TRANSCRIPT
Central Limit Theorem
MA700 Statistical Methods for Researchers
Dr. Sriram Devanathan
Amrita Vishwa Vidyapeetham
1
XX
Central Limit Theorem
As Sample SizeGets Large Enough
Sampling Distribution of
becomes
almost Normal
regardlessof shape of population
X
Central Limit Theorem:When sampling from almost any distribution,
is approximately Normally distributed in large samples.
X
Central Limit Theorem
As the sample size increases the sampling
distribution of the sample mean approaches the
normal distribution with mean and variance
2/n
~X ),(Nn
2
Normal Population Distribution
Let X1,…, Xn be a random sample from a normal
distribution with mean value and standard deviation
Then for any n, is normally distributed.
The Central Limit Theorem
Let X1,…, Xn be a random sample from a distribution with
mean value and variance Then if n sufficiently
large, has approximately a normal distribution with
The larger the value of n, the better the approximation.
.X
2.X
22 and ,X X n
The Central Limit Theorem
large
n
X
small to
moderate n
X
Population
distribution
Rule of Thumb
If n > 30, the Central Limit Theorem can be used.
0 1 2 3 4
A warning!
Not all distributions have finite mean and variance
For example, neither the Cauchy distribution (the
ratio of two standard normal random variables) nor
the distribution of the ratio of two iid exponentially
distributed random variables have any moments!
For such distributions, the CLT does not hold.
-10 -5 0 5 10
Cauchy
21
11)(
xxf
21
1)(
xxf
The Normal distribution
Chest measurements of 5738 Scottish
soldiers by Belgian scholar Lambert Quetelet
(1796-1874)
First application of the Normal distribution to
human data
4035 450.0
0.1
0.2
150 160 170 180 190 200.00
.02
.04
.06
(a) Chest measurements of Quetelet’s Scottish soldiers (in.)
(b) Heights of the 4294 men in the workforce database (cm)
= 39.8 in., = 2.05 in.
= 174 cm, = 6.57 cm
Normal density curve has
Normal density curve has
Figure 6.2.1 Two standardized histograms with approximating Normal densitycurves.
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
The sample mean has a sampling
distributionSampling batches of Scottish soldiers and taking chest
measurements. Pop mean = 39.8 in, Pop s.d. = 2.05 in
1
2
3
4
5
6
7
8
9
10
12
11
34 36 38 40 42 44 46
(a) 12 samples of size n = 6Samplenumber
Chest measurement (in.)
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999.
Twelve samples of size 24
34 36 38 40 42 44 46
(b) 12 samples of size n = 24Samplenumber
Chest measurement (in.)
1
2
3
4
5
6
7
8
9
10
12
11
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Histograms from 100,000 samples
(c) n = 100
(b) n = 24
393837 40 41 42
393837 40 41 42
393837 40 41 42
0.0
0.5
1.0
1.5
0.0
0.5
1.0
0.0
0.5
Sample mean of chest measurements (in.)
(a) n = 6
Figure 7.2.2 Standardised histograms of the sample means from 100,000 samples of soldiers (n soldiers per sample).
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Central Limit Effect -- Histograms of sample means
n = 2n = 1
n = 4 n = 10
00.0 0.2 0.4 0.6 0.8 1.0
1
2
3
0.0 0.2 0.4 0.6 0.8 1.00
1
2
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
4
0.0 0.2 0.4 0.6 0.8 1.00
1
2
(b) Uniform
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Central Limit Effect -- Histograms of sample means
n = 2n = 1
n = 4 n = 10
0 1 2 3 4 5 60.0
0.2
0.4
0.6
0.8
1.0
0 1 20.0
0.4
0.8
1.2
0 1 2 3 4 5 60.0
0.2
0.4
0.6
0.8
1.0
(a) Exponential
0 1 2 30.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 40.0
0.2
0.4
0.6
0.8
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Central Limit Effect -- Histograms of sample means
n = 2n = 1
n = 4 n = 10
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
0.0 0.2 0.4 0.6 0.8 1.00
1
2
3
(b) Quadratic U
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Consequences of the CLT
When asking questions about the mean(s) of distributions, we can use theory based on the Normal distribution
Is the mean different from zero?
Are the means different from each other?
Traits that are made up of the sum of many parts are likely to follow a Normal distribution
True even for mixture distributions
Distributions related to the Normal distribution are widely relevant to statistical analyses
c2 distribution
t-distribution
F-distribution