week 7
DESCRIPTION
Week 7. Sample Means & Proportions. Variability of Summary Statistics. Variability in shape of distn of sample Variability in summary statistics Mean, median, st devn, upper quartile, … Summary statistics have distributions. Parameters and statistics. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/1.jpg)
Week 7
Sample Means & Proportions
![Page 2: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/2.jpg)
Variability of Summary Statistics
Variability in shape of distn of sample
Variability in summary statistics Mean, median, st devn, upper quartile, …
Summary statistics have distributions
![Page 3: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/3.jpg)
Parameters and statistics
Parameter describes underlying population Constant Greek letter (e.g. , , , …) Unknown value in practice
Summary statistic Random Roman letter (e.g. m, s, p, …)
We hope statistic will tell us about corresponding parameter
![Page 4: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/4.jpg)
Distn of sample vsSampling distn of statistic
Values in a single random sample have a distribution
Single sample --> single value for statistic
Sample-to-sample variability of statistic is its sampling distribution.
![Page 5: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/5.jpg)
Means
Unknown population mean,
Sample mean, X, has a distribution — its sampling distribution.
Usually x ≠
A single sample mean, x, gives us information about
![Page 6: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/6.jpg)
Sampling distribution of mean
If sample size, n, increases:
Spread of distn of sample is (approx) same.
Spread of sampling distn of mean gets smaller. x is likely to be closer to x becomes a better estimate of
![Page 7: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/7.jpg)
Sampling distribution of mean
Sample mean, X, has sampling distn with: Mean,
St devn,
Population with mean , st devn
(We will deal later with the problem that and are unknown in practice.)
€
X
= μ
€
X
= σn
Random sample (n independent values)
![Page 8: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/8.jpg)
Weight loss
Random sample of n = 25 people Sample mean, x
Estimate mean weight loss for those attending clinic for 10 weeks
How accurate?
Let’s see, if the population distn of weight loss is:
€
X ~ normal μ =8lb, σ =5lb ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
![Page 9: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/9.jpg)
Some samples
Four random samples of n = 25 people:
1. Mean = 8.32 pounds, st devn = 4.74 pounds
2. Mean = 8.32 pounds, st devn = 4.74 pounds
3. Mean = 8.48 pounds, st devn = 5.27 pounds
4. Mean = 7.16 pounds, st devn = 5.93 pounds
N.B. In all samples, x ≠
![Page 10: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/10.jpg)
Sampling distribution
Means from simulation of 400 samples
Theory:
(How does this compare to simulation? To popn distn?)
mean = = 8 lb, s.d.( ) = lbx 125
5==
n
![Page 11: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/11.jpg)
Errors in estimation
From 70-95-100 rule x will be almost certainly within 8 ± 3 lb x is unlikely to be more than 3 lb in error
Even if we didn’t know x is unlikely to be more than 3 lb in error
€
X ~ normal μ =8lb, σ =5lb ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
mean = = 8 lb, s.d.( ) = lbx 125
5==
n
Population
Sampling distribution of mean
![Page 12: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/12.jpg)
Increasing sample size, n
If we sample n = 100 people instead of 25:
s.d.( ) = lb.x 5.0100
5==
n
Larger samples more accurate estimates
![Page 13: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/13.jpg)
Central Limit Theorem
If population is normal (, )
If popn is non-normal with (, ) but n is large
Guideline: n > 30 even if very non-normal
€
X ~ normal, n
⎛
⎝ ⎜
⎞
⎠ ⎟
€
X approx ~ normal, n
⎛
⎝ ⎜
⎞
⎠ ⎟
![Page 14: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/14.jpg)
Other summary statistics
E.g. Lower quartile, proportion, correlation
Usually not normal distns
Formula for standard devn of samling distn sometimes
Sampling distn usually close to normal if n is large
![Page 15: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/15.jpg)
Lottery problem
Pennsylvania Cash 5 lottery 5 numbers selected from 1-39 Pick birthdays of family members (none 32-39) P(highest selected is 32 or over)?
Statistic:
H = highest of 5 random numbers (without replacement)
![Page 16: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/16.jpg)
Lottery simulation
Simulation: Generated 5 numbers (without replacement) 1560 times
Theory? Fairly hard.
Highest number > 31 in about 72% of repetitions
![Page 17: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/17.jpg)
Normal distributions
Family of distributions (populations) Shape depends only on parameters (mean) & (st devn)
All have same symmetric ‘bell shape’
= 65 inches, = 2.7 inches
![Page 18: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/18.jpg)
Importance of normal distn
A reasonable model for many data sets
Transformed data often approx normal
Sample means (and many other statistics) are approx normal.
![Page 19: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/19.jpg)
Standard normal distribution
Z ~ Normal ( = 0, = 1)
0 1 2 3-1-2-3
Prob ( Z < z* )
![Page 20: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/20.jpg)
Probabilities for normal (0, 1)
Check from tables:P(Z -3.00) =
P(Z −2.59) =
P(Z 1.31) =
P(Z 2.00) =
P(Z -4.75) =
0.0013
0 .0048
0 .9049
0 .9772
0 .000001
![Page 21: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/21.jpg)
Probability Z > 1.31
P(Z > 1.31) = 1 – P(Z 1.31)
= 1 – .9049 = .0951
![Page 22: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/22.jpg)
Prob ( Z between –2.59 and 1.31)
P(-2.59 Z 1.31)
= P(Z 1.31) – P(Z -2.59)
= .9049 – .0048 = .9001
![Page 23: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/23.jpg)
Standard devns from mean
Normal (, )
= 65 inches, = 2.7 inches
Heightsof students
![Page 24: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/24.jpg)
Probability and area
X ~ normal ( = 65 , = 2.7 )
P (X ≤ 67.7) = area
![Page 25: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/25.jpg)
Probability and area (cont.)
Exactly 70-95-100 rule
P(X within of ) = 0.683 approx 70% P(X within 2 of ) = 0.954 approx 95% P(X within 3 of ) = 0.997 approx 100%
Normal (, )
![Page 26: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/26.jpg)
Finding approx probabilities
Prob (X ≤ 62 )?
About 1/8
Ht of college woman, X ~ normal ( = 65 , = 2.7 )
P (X ≤ 62) = area
1. Sketch normal density
2. Estimate area
![Page 27: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/27.jpg)
Translate question from X to Z
Translate to z-score:
Z ~ Normal ( = 0, = 1)0 1 2 3-1-2-3
X ~ Normal (, ) Find P(X ≤ x*)
€
Z = X −
x*
z*
![Page 28: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/28.jpg)
Finding probabilities
Prob (height of randomly selected college woman ≤ 62 )?
€
P X ≤6( ) =P Z≤6 −65.7
⎛
⎝ ⎜
⎞
⎠ ⎟
=P Z≤−1.11( ) =.15About 13%.
![Page 29: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/29.jpg)
Prob (X > value)
( ) ( ) ( )
1335.8665.1
11.1111.17.2
658668
=−=
≤−=>=⎟⎠
⎞⎜⎝
⎛ −>=> ZPZPZPXP
Prob (X > 68 inches)?
Ht of college woman, X ~ normal ( = 65 , = 2.7 )
![Page 30: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/30.jpg)
Finding upper quartile
Blood Pressures are normal with mean 120 and standard deviation 10. What is the 75th percentile?
Step 1: Solve for z-score
Step 2: Calculate x = z* + x = (0.67)(10) + 120 = 126.7 or about 127.
Closest z* with area of 0.7500 (tables)
z = 0.67
![Page 31: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/31.jpg)
Probabilities about means
Blood pressure ~ normal ( = 120, = 10)
8 people given drug
If drug does not affect blood pressure, Find P(average blood pressure > 130)
![Page 32: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/32.jpg)
P ( X > 130) ?
prob = 0.0023
€
X ~ normal X =10 , X =108 = .54
⎛
⎝ ⎜
⎞
⎠ ⎟
€
z = 10 −10.54
= .8
Very little chance!
X ~ normal ( = 120, = 10) n = 8
![Page 33: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/33.jpg)
Distribution of sum
X ~ distn with (, )
aX ~ distn with (a, a)
€
X ~ distn with , n
⎛
⎝ ⎜
⎞
⎠ ⎟
€
X = n∑ X ~ distn with n, n( )
Central Limit Theorem implies approx normal
e.g. milesto kilometers
![Page 34: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/34.jpg)
Probabilities about sum
Profit in 1 day ~ normal ( = $300, = $200)
Prob(total profit in week < $1,000)?
Total =
Prob = 0.0188
€
X∑ ~ normal 7 =,100, 7 =59( )
€
z = 1000 −10059
= −.08 Assumesindependence
![Page 35: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/35.jpg)
Categorical data
Most important parameter is = Prob (success)
Corresponding summary statistic is p = Proportion (success)
N.B. Textbook uses p and p̂
![Page 36: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/36.jpg)
Number of successes
Easiest to deal with count of successes before proportion.
If…
1. n “trials” (fixed beforehand).
2. Only “success” or “failure” possible for each trial.
3. Outcomes are independent.
4. Prob (success), remains same for all trials, .• Prob (failure) is 1 – .
X = number of successes ~ binomial (n, )
![Page 37: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/37.jpg)
Examples
![Page 38: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/38.jpg)
Binomial Probabilities
Prob (win game) = 0.2
Plays of game are independent.
What is Prob (wins 2 out of 3 games)?
What is P(X = 2)?
€
P X =k( ) =n!
k! n−k( )! k 1−( )
n−kfor k = 0, 1, 2, …, n
€
P X =( ) =!
! −( )!. 1−.( )
−
=(.) (.8)1 =0.096
You won’t need to use this!!
![Page 39: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/39.jpg)
Mean & st devn of Binomial
For a binomial (n, )
€
Mean =n Standard deviation = n 1−( )
![Page 40: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/40.jpg)
Extraterrestrial Life?
50% of large population would say “yes” if asked, “Do you believe there is extraterrestrial life?”
Sample of n = 100
X = # “yes” ~ binomial (n = 100, = 0.5)
€
Mean =E X( ) =100 (.5) =50
Standard deviation = 100 (.5) .5( ) =5
![Page 41: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/41.jpg)
Extraterrestrial Life?
70-95-100 rule of thumb for # “yes” About 95% chance of between 40 & 60 Almost certainly between 35 & 65
Sample of n = 100
X = # “yes” ~ binomial (n = 100, = 0.5)
€
=E X( ) =100(.5) =50
= 100(.5) .5( ) =5
![Page 42: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/42.jpg)
Normal approx to binomial
If X is binomial (n , ), and n is large, then X is also approximately normal, with
Conditions: Both n and n(1 – ) are at least 10.
€
Mean =E X( ) =n
Standard deviation = n 1−( )
(Justified by Central Limit Theorem)
![Page 43: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/43.jpg)
Number of H in 30 Flips
X = # heads in n = 30 flips of fair coinX ~ binomial ( n = 30, = 0.5)
€
=E X( ) =0(.5) =15
= 0(.5) .5( ) =.74
Bell-shaped & approx normal.
![Page 44: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/44.jpg)
Opinion poll
n = 500 adults; 240 agreed with statement
€
=E X( ) =500(.5) =50
= 100(.5) .5( ) =11.
X is approx normal with
€
P X ≤40( ) ≈P Z ≤40 −5011.
⎛
⎝ ⎜
⎞
⎠ ⎟=P Z ≤−.89( ) =.1867
Not unlikely to see 48% or less, even if 50% in population agree.
If = 0.5 of all adults agree, what P(X ≤ 240) ?
![Page 45: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/45.jpg)
Sample Proportion
Suppose (unknown to us) 40% of a population carry the gene for a disease, ( = 0.40).
Random sample of 25 people; X = # with gene. X ~ binomial (n = 25 , = 0.4)
p = proportion with gene
€
p = Xn
![Page 46: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/46.jpg)
Distn of sample proportion
X ~ binomial (n , )
€
X =n
X = n 1−( )
€
p = Xn
€
p =
p = 1−( )
n
Large n:p is approx normal
(n ≥ 10 & n (1 – ) ≥ 10)
![Page 47: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/47.jpg)
Examples
Election Polls: to estimate proportion who favor a candidate; units = all voters.
Television Ratings: to estimate proportion of households watching TV program; units = all households with TV.
Consumer Preferences: to estimate proportion of consumers who prefer new recipe compared with old; units = all consumers.
Testing ESP: to estimate probability a person can successfully guess which of 5 symbols on a hidden card; repeatable situation = a guess.
![Page 48: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/48.jpg)
Public opinion pollSuppose 40% of all voters favor Candidate A.
Pollsters sample n = 2400 voters.
Simulation 400 times & theory.
€
p = = 0.4
p = 1−( )
n = 0.4 ×0.6
400 = 0.01
Propn voting for A is approx normal
![Page 49: Week 7](https://reader036.vdocument.in/reader036/viewer/2022062315/56814e4a550346895dbbcd49/html5/thumbnails/49.jpg)
Probability from normal approx
If 40% of voters favor Candidate A, and n = 2400 sampled
€
p = 0.4
p = 0.01
Sample proportion, p, is almost certain to be between 0.37 and 0.43
Prob 0.95 of p being between 0.38 and 0.42