#05. z-scores, normal distribution - michigan state …€¦ · microsoft powerpoint - #05....
TRANSCRIPT
STT 200
Arnab Arnab Arnab Arnab BhattacharjeeBhattacharjeeBhattacharjeeBhattacharjee
This note is based on Chapter 6.
Acknowledgement: Author is indebted to Dr. Ashoke Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit
many of their slides.
3
How to compare apples with oranges?
• A college admissions committee is looking at the
files of two candidates, one with a total SAT
score of 1500 and another with an ACT score of
22. Which candidate scored better?
• How do we compare things when they are
measured on different scales?
• We need to standardize the values.
4
How to standardize?
• Subtract mean from the value and then divide this
difference by the standard deviation.
• The standardized value = the z-score =���������
��.���.
• z-scores are free of units.
5
z-scores: An Example
Data: 4, 3, 10, 12, 8, 9, 3 (� = 7 in this case)
Mean = (4 + 3 + 10 + 12 + 8 + 9 + 3)/7 = 49/7 = 7.
Standard Deviation = 3.65.
Data z-scores
4 (4– 7)/3.65 = −0.82
3 (3– 7)/3.65 = −1.10
10 (10– 7)/3.65 = 0.82
12 (12– 7)/3.65 = 1.37
8 (8– 7)/3.65 = 0.27
9 (9– 7)/3.65 = 0.55
3 (3– 7)/3.65 = −1.10
6
Interpretation of z-scores
• The z-scores measure the distance of the data values
from the mean in the standard deviation scale.
• A z-score of 1 means that data value is 1 standard
deviation above the mean.
• A z-score of -1.2 means that data value is 1.2 standard
deviations below the mean.
• Regardless of the direction, the further a data value is
from the mean, the more unusual it is.
• A z-score of -1.3 is more unusual than a z-score of 1.2.
7
How to use z-scores?
• A college admissions committee is looking at the files of two candidates, one with a total SAT score of 1500 and another with an ACT score of 22. Which candidate scored better?
• SAT score mean = 1600, std dev = 500.
• ACT score mean = 23, std dev = 6.
• SAT score 1500 has z-score = (1500 − 1600)/500 = −0.2.
• ACT score 22 has z-score = (22 − 23)/6 = −0.17.
• ACT score 22 is better than SAT score 1500.
8
Which is more unusual?
A. A 58 in tall woman
z-score = (58 − 63.6)/2.5 = −2.24.
B. A 64 in tall man
z-score = (64 − 69)/2.8 = −1.79.
C. They are the same.
Heights of adult men have
� mean of 69.0 in.
� std. dev. of 2.8 in.
Heights of adult women have
� mean of 63.6 in.
� std. dev. of 2.5 in.
9
Using z-scores to solve problems
An example using height data and U.S. Marine and
Army height requirements
Question: Are the height restrictions set up by the
U.S. Army and U.S. Marine more restrictive for
men or women or are they roughly the same?
10
Heights of adult women have
• mean of 63.6 in.
• standard deviation of 2.5 in.
Heights of adult men have
– mean of 69.0 in.
– standard deviation of 2.8 in.
Men
Minimum
Women
Minimum
U.S. Army 60 in 58 in
U.S. Marine Corps 64 in 58 in
Height Restrictions
Data from a National Health Survey
11
Men Minimum Women minimum
U.S.
Army
U.S.
Marine
60 in
z-score = -3.21
Less restrictive
58 in
z-score = -2.24
More restrictive
64 in
z-score = -1.79
More restrictive
58 in
z-score = -2.24
Less restrictive
Heights of adult women have
• mean of 63.6 in.
• standard deviation of 2.5 in.
Heights of adult men have
– mean of 69.0 in.
– standard deviation of 2.8 in.
12
Austra Skujyte (Lithunia)
Shot Put = 16.40m,
Long Jump = 6.30m.
Carolina Kluft (Sweden)
Shot Put = 14.77m,
Long Jump = 6.78m.
Shot Put Long Jump
Mean(all contestants)
13.29m 6.16m
Std.Dev. 1.24m 0.23m
� 28 26
2004 Olympics
Women’s Heptathlon
Which performance was better?
13
A. Skujyte’s shot put,
z-score of Skujyte’s shot put = 2.51.
B. Kluft’s long jump,
z-score of Kluft’s long jump = 2.70.
C. Both were same.
Shot Put Long Jump
Mean(all contestant)
13.29m 6.16m
Std.Dev. 1.24m 0.23m
� 28 26
Based on shot put and long jump whose
performance was better?
14
A. Skujyte’s,
z-score: shot put = 2.51, long jump = 0.61.
Total z-score = (2.51+0.61) = 3.12.
B. Kluft’s,
z-score: shot put = 1.19, long jump = 2.70.
Total z-score = (1.19+2.70) = 3.89.
C. Both were same.
16
Effect of Standardization
• Standardization into z-scores does not change
the shape of the histogram.
• Standardization into z-scores changes the center
of the distribution by making the mean 0.
• Standardization into z-scores changes the spread
of the distribution by making the standard
deviation 1.
17
The Normal Distribution
• In many data-sets, the histogram is symmetric, unimodal and bell-shaped.
• These distributions are known as normal
distribution and the data are said to be normally
distributed.
18
The Histogram of z-scores
If data are normally distributed then
•The histogram of z-scores is also symmetric, unimodal and bell-shaped.
•We can approximate the histogram by a bell-shaped curve called the normal curve.
19
68-95-99.7 (Empirical) Rule
When data are bell shaped, the z-scores of the data
values follow the empirical rule.
20
More on Normal Distribution
68-95-99.7 (Empirical) Rule tells us that if data are
normally distributed, then almost all the data-
points are within plus minus 3 standard deviations
from the mean.
21
Approximately what percent of U.S. women do
you expect to be between 66 in and 67 in tall?
Heights of adult women are normally distributed with
• mean of 63.6 in,
• standard deviation of 2.5 in.
Use TI 83/84 Plus.
• Press [2nd] & [VARS] (i.e. [DISTR])
• Select 2: normalcdf
• Format of command:
normalcdf(lower bound, upper bound, mean, std.dev.)
For this problem: normalcdf(66, 67, 63.6, 2.5) = 0.0816.
i.e. about 8.2% of adult U.S. women have heights between 66
in and 67 in.
22
Approximately what percent of U.S. women do
you expect to be less than 64 in tall?
Heights of adult women are normally distributed with
• mean of 63.6 in,
• standard deviation of 2.5 in.
� Note that here upper bound is 64, but there is no mention of
lower bound.
� So take a very small value for lower bound, say -1000.
For this problem
normalcdf(-1000, 64, 63.6, 2.5) = 0.5636.
i.e. about 56.4% of adult U.S. women have heights less than 64 in.
23
Approximately what percent of U.S. women
do you expect to be more than 58 in tall?
Heights of adult women are normally distributed with
• mean of 63.6 in,
• standard deviation of 2.5 in.
� Note that here lower bound is 58, but there is no
mention of upper bound.
� So take a very high value for upper bound, say 1000.
For this problem
normalcdf(58, 1000, 63.6, 2.5) = 0.987.
i.e. 98.7% of adult U.S. women have heights more than 58
in.
What about men’s height?
24
Heights of adult men are normally distributed with
• mean of 69 in,
• standard deviation of 2.8 in.
onormalcdf(60, 1000, 69, 2.8) = 0.999.
Hence 99.9% adult male will have height more than 60 in.
onormalcdf(64, 1000, 69, 2.8) = 0.963.
So 96.3% adult male will have height more than 64 in.
�Thus for U.S. Army height restriction for women is more restrictive compared to men.
�But for U.S. Marine height restriction for men is more restrictive compared to women.
25
Below what height 80% of U.S. men do
have their heights?
Heights of adult men are normally distributed with
• mean of 69 in,
• standard deviation of 2.8 in.
The question is to find the height x such that
{Percent of men’s height < x} = 80% = 0.8.
Use TI 83/84 Plus.
• Press [2nd] & [VARS] (i.e. [DISTR])
• Select 3: invNorm
• Format of command:
invNorm(fraction, mean, std.dev.)
For this problem: invNorm(0.8, 69, 2.8) = 71.36.
i.e. 80% of U.S. men have heights less than 71.36 in.
Remark: invNorm
26
• invNorm only considers percentage or fraction in the lower
tail of normal distribution.
• For example, suppose the question is
“Above what height 10% of U.S. men
do have their heights?”
Notice here the question is find the height ! such that
{Percent of men’s height > !} = 10% = 0.1.
This means
{Percent of men’s height < !} = (100 − 10)% = 90% = 0.9.
For this problem: invNorm(0.9, 69, 2.8) = 72.59.
i.e. 90% of U.S. men have heights less than 72.59 in,
i.e. 10% of U.S. men have heights more than 72.59 in.