17 sampling dist
Embed Size (px)
TRANSCRIPT

Hadley Wickham
Stat310Sampling distributions
Monday, 22 March 2010

Quiz
• Pick up quiz on your way in
• Start at 1pm
• Finish at 1:10pm
• Closed book
Monday, 22 March 2010

1. Quiz
2. CLT & approximations
3. Sampling distributions
4. Example
5. More theory
Monday, 22 March 2010

CLT
Central limit theorem.
The distribution of a mean is normal when gets big.
Monday, 22 March 2010

Approximation
This implies that if n is big then ...
Monday, 22 March 2010

Sampling distributions
Monday, 22 March 2010

Random experiment“A random experiment is an experiment, trial, or observation that can be repeated numerous times under the same conditions... It must in no way be affected by any previous outcome and cannot be predicted with certainty.” (http://cnx.org/content/m13470/latest/)
i.e. it is uncertain (we don’t know ahead of time what the answer will be) and repeatable (ideally).
Monday, 22 March 2010

Where we are
Univariate random variables: an experiment with one output
Bivariate random variables: an experiment with two outputs
Sequences of random variables:An experiment performed repeatedly.Repeatable = i.i.d
Monday, 22 March 2010

A sampling distribution:Summary statistics from a repeated experiment
Monday, 22 March 2010

Definitions
Sample = results of n random experiments.
Random sample = result of a random experimented repeated n times. Therefore, they’re iid.
Both are sequences of random variables.
Statistic = A function of random variables with no unknown parameters.
Monday, 22 March 2010

Example
Spin a bottle and record the angle in degrees in which it points. Repeat.
How would you write this mathematically?
Monday, 22 March 2010

First time
x1 = 205, x2 = 256, x3 = 86, x4 = 119, x5 = 16, x6 = 278, x7 = 55, x8 = 16, x9 = 295, x10 = 341, x11 = 299, x12 = 270,x13 = 118, x14 = 360, x15 = 97, x16 = 282, x17 = 42, x18 = 283, x19 = 259, x20 = 326
Monday, 22 March 2010

Second time
x1 = 184, x2 = 344, x3 = 118, x4 = 226, x5 = 208, x6 = 106, x7 = 332, x8 = 310, x9 = 339, x10 = 95, x11 = 7, x12 = 274, x13 = 120, x14 = 346, x15 = 211, x16 = 166, x17 = 84, x18 = 102, x19 = 32, x20 = 128
Monday, 22 March 2010

Value
Experim
ent
5
10
15
20
● ● ●●●● ●● ● ●●● ● ●● ● ●● ●●
●● ●● ●●●● ● ●● ●● ●● ●● ●● ●
● ●● ●● ●●● ●● ●●● ●● ●●●●●
●● ●● ●●● ● ●● ●● ●● ●● ●● ●●
● ● ●●● ●●● ●● ● ●● ●● ●● ● ●●
●● ● ● ●●● ● ●●●● ●●● ●● ●●●
●●● ● ●●●● ●● ●● ●● ● ● ●● ●●
●● ●●● ●●● ● ● ● ●●● ●● ● ●●●
●● ●●● ●● ● ●● ● ●● ●● ●● ●● ●
● ●● ● ● ● ● ●● ●●● ●● ● ●●● ●●
●● ● ●● ●●● ●● ●● ●● ●●● ●●●
● ●●● ●● ● ●●● ● ●● ●● ●●● ●●
● ● ●● ●●● ●●● ● ●●● ●● ●● ● ●
● ●●● ● ●● ●●● ● ● ●●●●● ●●●
●● ●● ●● ●● ●●●● ● ● ●● ●● ●●
●● ●● ●●●●●● ●● ● ●● ● ● ●●●
●● ●● ●● ●●● ● ●● ● ●● ●● ● ●●
●●● ●● ● ● ●●● ●●● ●●● ● ●● ●
● ●● ● ●●● ●● ●● ● ●● ●● ●● ●●
● ● ●●●●● ●● ● ●● ● ● ●● ●● ●●
50 100 150 200 250 300 350
Monday, 22 March 2010

Value
Experim
ent
5
10
15
20
● ● ●●●● ●● ● ●●● ● ●● ● ●● ●●
●● ●● ●●●● ● ●● ●● ●● ●● ●● ●
● ●● ●● ●●● ●● ●●● ●● ●●●●●
●● ●● ●●● ● ●● ●● ●● ●● ●● ●●
● ● ●●● ●●● ●● ● ●● ●● ●● ● ●●
●● ● ● ●●● ● ●●●● ●●● ●● ●●●
●●● ● ●●●● ●● ●● ●● ● ● ●● ●●
●● ●●● ●●● ● ● ● ●●● ●● ● ●●●
●● ●●● ●● ● ●● ● ●● ●● ●● ●● ●
● ●● ● ● ● ● ●● ●●● ●● ● ●●● ●●
●● ● ●● ●●● ●● ●● ●● ●●● ●●●
● ●●● ●● ● ●●● ● ●● ●● ●●● ●●
● ● ●● ●●● ●●● ● ●●● ●● ●● ● ●
● ●●● ● ●● ●●● ● ● ●●●●● ●●●
●● ●● ●● ●● ●●●● ● ● ●● ●● ●●
●● ●● ●●●●●● ●● ● ●● ● ● ●●●
●● ●● ●● ●●● ● ●● ● ●● ●● ● ●●
●●● ●● ● ● ●●● ●●● ●●● ● ●● ●
● ●● ● ●●● ●● ●● ● ●● ●● ●● ●●
● ● ●●●●● ●● ● ●● ● ● ●● ●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
50 100 150 200 250 300 350
Monday, 22 March 2010

samp
count
0
1
2
3
4
140 160 180 200
Monday, 22 March 2010

V1
count
0
2000
4000
6000
8000
100 150 200 250
Monday, 22 March 2010

V1
count
0
2000
4000
6000
8000
100 150 200 250
What will happen as I vary the number of samples I average over? (What theorem applies here?)
Monday, 22 March 2010

mean
count 0
100
200
300
400
0
100
200
300
400
1
4
0 50 100 150 200 250 300 350
2
5
0 50 100 150 200 250 300 350
3
0 50 100 150 200 250 300 350
Monday, 22 March 2010

mean
coun
t 0
1000
2000
3000
4000
0
1000
2000
3000
4000
1
100
0 50 100 150 200 250 300 350
10
1000
0 50 100 150 200 250 300 350
Monday, 22 March 2010

mean
coun
t 0
1000
2000
3000
4000
0
1000
2000
3000
4000
1
100
0 50 100 150 200 250 300 350
10
1000
0 50 100 150 200 250 300 350
How can I transform this random variable to make it comparable? (What theorem applies here?)
Monday, 22 March 2010

(mean − 180) * sqrt(n)
coun
t
0
200
400
600
800
0
200
400
600
800
0
200
400
600
800
1
5
1000
−400−200 0 200 400
2
10
10000
−400−200 0 200 400
3
20
−400−200 0 200 400
4
100
−400−200 0 200 400
Monday, 22 March 2010

sqrt(var)
count 0
200
400
600
800
1000
0
200
400
600
800
1000
2
4
0 50 100 150 200 250
3
5
0 50 100 150 200 250
We can do the same thing for other statistics...
Monday, 22 March 2010

sqrt(var)
coun
t
0100200300400500600
0
200
400
600
0
200
400
600
800
2
0 50 100 150 200 250 5
50 100 150 100
90 95 100 105 110 115 120
0100200300400500
0
200
400
600
800
0200400600800
1000
3
0 50 100 150 200 10
40 60 80 100 120 140 160 1000
98 100 102 104 106 108 110
0100200300400500600700
0200400600800
1000
0
200
400
600
800
4
0 50 100 150 20
60 80 100 120 14010000
102.5103.0103.5104.0104.5105.0105.5
Monday, 22 March 2010

Theory
We’ll start with the mean of normally distributed random variables, then try to extend in various ways.
Monday, 22 March 2010

Your turnX1, X2, ... are iid N(μ, σ2)
Find their mgfs. What do you notice?
Hint:
Sn =n�
1
Xi X̄n =Sn
n
MX(t) = exp�µt + σ2t2
�
Monday, 22 March 2010

Reading
4.2, 4.2.1
4.2.2, 4.4
Monday, 22 March 2010