chapter 2: describing location in a distribution - quia · chapter 2: describing location in a...
TRANSCRIPT
Chapter 2: Describing Location in a Distribution
1
Chapter 2: Describing Location in a Distribution
Objectives: Students will:
Be able to compute measures of relative standing for individual values in a distribution. This includes standardized
values z-scores and percentile ranks.
Use Chebyshev’s Inequality to describe the percentage of values in a distribution within an interval centered at the
mean.
Demonstrate an understanding of a density curve, including its mean and median.
Demonstrate an understanding of the Normal distribution and the 68-95-99.7 Rule.
Use tables and technology to find (a) the proportion of values on an interval of the Normal distribution and (b) a value
with a given proportion of observations above or below it.
Use a variety of techniques, including construction of a normal probability plot, to assess the Normality of a
distribution.
AP Outline Fit:
I. Exploring Data: Describing patterns and departures from patterns (20%–30%) B. Summarizing distributions of univariate data
3. Measuring position: . . . percentiles, standardized scores (z-scores)
III. Anticipating Patterns: Exploring random phenomena using probability and simulation (20%–30%)
C. The normal distribution
1. Properties of the normal distribution
2. Using tables of the normal distribution
3. The normal distribution as a model for measurements
What You Will Learn:
A. Measures of Relative Standing
1. Find the standardized value (z-score) of an observation. Interpret z-scores in context
2. Use percentiles to locate individual values within distributions of data
3. Apply Chebyshev’s inequality to a given distribution of data
B. Density Curves
1. Know that areas under a density curve represent proportions of all observations and that the total area under a
density curve is 1
2. Approximately locate the median (equal-areas point) and the mean (balance point) on a density curve
3. Know that the mean and median both lie at the center of a symmetric density curve and that the mean moves farther
toward the long tail of a skewed curve
C. Normal Distribution
1. Recognize the shape of Normal curves and be able to estimate both the mean and standard deviation from such a
curve
2. Use the 68-95-99.7 rule (Empirical Rule) and symmetry to state what percent of the observations from a Normal
distribution fall between two points when the points lie at the mean or one, two, or three standard deviations on
either side of the mean
3. Use the standard Normal distribution to calculate the proportion of values in a specified range and to determine a z-
score from a percentile
4. Given that a variable has the Normal distribution with mean and standard deviation , use Table A and your
calculator to
a. determine the proportion of values in a specified range
b. calculate the point having a stated proportion of all values to the left or to the right of it
D. Assessing Normality
1. Plot a histogram, stemplot, and/or boxplot to determine if a distribution is bell-shaped
2. Determine the proportion of observations within one, two, and three standard deviations of the mean and compare
with the 68-95-99.7 rule (Empirical rule) for Normal distributions
3. Construct and interpret Normal probability plots
Chapter 2: Describing Location in a Distribution
2
Section 2.1: Measures of Relative Standing and Density Curves
Knowledge Objectives: Students will:
• Measuring position with percentiles
• Cumulative relative frequency graphs
• Measuring position with z-scores
• Transforming data
• Density curves
• Describing density curves
Vocabulary:
Chebyshev’s Inequality – rule of thumb for percentage of observations in any distribution within in specified standard
deviations from the mean
Density Curve – the curve that represents the proportions of the observations; and describes the overall pattern
Mathematical Model – an idealized representation
Median of a Density Curve – is the “equal-areas point” and denoted by M or Med
Mean of a Density Curve – is the “balance point” and denoted by (Greek letter mu)
Normal Curve – a special symmetric, mound shaped density curve with special characteristics
Pth
Percentile – the observation’s percentage location in the sample’s ordered list
Standard Deviation of a Density Curve – is denoted by (Greek letter sigma)
Standardized Value – a z score
Standardizing – converting data from original values to standard deviation units
Uniform Distribution – a symmetric rectangular shaped density distribution
Z-score – a standardized value indicating the number of standard deviations about or below the mean
Key Concepts:
• Another measure of relative standing is a percentile rank
• pth
percentile: Value with p % of observations below it
– median = 50th percentile {mean=50th %ile if symmetric}
– Q1 = 25th percentile
– Q3 = 75th percentile
Chapter 2: Describing Location in a Distribution
3
Examples:
1: Summary of the scores listed below:
According to Minitab, the mean test score was 80 while the standard deviation was 6.07 points.
79 81 80 77 73 83 74 93 78 80 75 67 73
77 83 86 90 79 85 83 89 84 82 77 72
Jenny’s score, 86, was above average. Her standardized z-score is:
a. Kevin scored a 72, what is his z-score?
b. Katie scored an 80, what is her z-score?
2: Standardized values can be used to compare scores from two different distributions
Statistics Test: mean = 80, standard deviation = 6.07
Chemistry Test: mean = 76, standard deviation = 4
Jenny got an 86 in Statistics and 82 in Chemistry.
On which test did she perform better?
3: What is Jenny’s percentile?
4: If the mean of a distribution is 14 and its standard deviation is 6, what happens to these values if each observation has 3
added to it?
Given the same distribution (from above), what happens if each observation has been multiplied by -4?
4: Determine which the mean, median and mode are in the following:
5: A random number generator on calculators randomly generates a number between 0 and 1. The random variable X, the
number generated, follows a uniform distribution
a. Draw a graph of this distribution
b. What is the percentage (0<X<0.2)?
c. What is the percentage (0.25<X<0.6)?
d. What is the percentage > 0.95?
e. Use calculator to generate 200 random numbers
Homework: pg. 105-09, probs: 1, 5, 9, 11, 13, 15, 19, 31
Chapter 2: Describing Location in a Distribution
4
Section 2.2: Normal Distributions
Objectives: Students will:
• DESCRIBE and APPLY the 68-95-99.7 Rule
• DESCRIBE the standard Normal Distribution
• PERFORM Normal distribution calculations
• ASSESS Normality
Vocabulary:
68-95-99.7 Rule (or Empirical Rule) – given a density curve is normal (or population is normal), then the following is
true:
within plus or minus one standard deviation is 68% of data
within plus or minus two standard deviation is 95% of data
within plus or minus three standard deviation is 99.7% of data
Inverse Normal – calculator function that allows you to find a data value given the area under the curve (percentage)
Normal curve – special family of bell-shaped, symmetric density curves that follow a complex formula
Standard Normal Distribution – a normal distribution with a mean of 0 and a standard deviation of 1
Key Concepts:
Properties of the Normal Density Curve
1. It is symmetric about its mean, μ
2. Because mean = median = mode, the highest point occurs at x = μ
3. It has inflection points at μ – σ and μ + σ
4. Area under the curve = 1
5. Area under the curve to the right of μ equals the area under the curve to the left of μ, which equals ½
6. As x increases without bound (gets larger and larger), the graph approaches, but never reaches the
horizontal axis (like approaching an asymptote). As x decreases without bound (gets larger and larger in
the negative direction) the graph approaches, but never reaches, the horizontal axis.
7. The Empirical Rule applies
Normal Probability Density Function
1
y = -------- e
√2π
-(x – μ)2
2σ2
Empirical Rule
μμ - σ
μ - 2σμ - 3σ μ + σ
μ + 2σμ + 3σ
34% 34%13.5%13.5%
2.35% 2.35%
0.15% 0.15%
μ ± σμ ± 2σμ ± 3σ
68%
95%
99.7%
where μ is the mean and σ is the standard deviation of the random variable x
Chapter 2: Describing Location in a Distribution
5
TI-83 Normal Distribution functions:
#1: normalpdf pdf = Probability Density Function {Not used by students in Statistics!!}
This function returns the probability of a single value of the random variable x. Use this to graph a normal curve. Using this
function returns the y-coordinates of the normal curve.
Syntax: normalpdf (x, mean, standard deviation)
#2: normalcdf cdf = Cumulative Distribution Function
This function returns the cumulative probability from zero up to some input value of the random variable x. Technically, it
returns the percentage of area under a continuous distribution curve from negative infinity to the x. You can, however, set the
lower bound.
Syntax: normalcdf (lower bound, upper bound, mean, standard deviation)
#3: invNorm( inv = Inverse Normal Probability Distribution Function
This function returns the x-value given the probability region to the left of the x-value.
(0 < area < 1 must be true.) The inverse normal probability distribution function will find the precise value at a given percent
based upon the mean and standard deviation.
Syntax: invNorm (probability, mean, standard deviation)
(the above taken from http://mathbits.com/MathBits/TISection/Statistics2/normaldistribution.htm)
We can use -E99 for negative infinity and E99 for positive infinity.
Example 1: A random variable x is normally distributed with μ=10 and σ=3.
a. Compute Z for x1 = 8 and x2 = 12
b. If the area under the curve between x1 and x2 is 0.495, what is the area between z1 and z2?
Example 2: Determine the area under the standard normal curve that lies to the left of
a) Z = -3.49
b) Z = -1.99
c) Z = 0.92
d) Z = 2.90
Example 3: Determine the area under the standard normal curve that lies to the right of
a) Z = -3.49
b) Z = -0.55
c) Z = 2.23
d) Z = 3.45
Chapter 2: Describing Location in a Distribution
6
Example 3: Find the indicated probability of the standard normal random variable Z.
a) P(-2.55 < Z < 2.55)
b) P(-0.55 < Z < 0)
c) P(-1.04 < Z < 2.76)
Example 4:
a) Find the Z-score such that the area under the standard normal curve to the left is 0.1.
b) Find the Z-score such that the area under the standard normal curve to the right is 0.35.
Example 1: For a general random variable X with μ = 3 and σ = 2
a. Calculate Z
b. Calculate P(X < 6)
Example 2: For a general random variable X with μ = -2 and σ = 4
a. Calculate Z
b. Calculate P(X > -3)
Example 3: For a general random variable X with μ = 6 and σ = 4 calculate P(4 < X < 11)
Example 4: For a general random variable X with μ = 3 and σ = 2 find the value x such that P(X < x) = 0.3
Example 5: For a general random variable X with μ = –2 and σ = 4 find the value x such that P(X > x) = 0.2
Chapter 2: Describing Location in a Distribution
7
Example 6: For random variable X with μ = 6 and σ = 4. Find the values that contain 90% of the data around μ
Example 1: Normal data or not Example 2: Normal data or not
Example 3: Normal data or not Example 4: Normal data or not
Example 5: Normal data or not Example 6: Normal data or not
Homework: pg. 131-6, probs: 41, 43, 45, 47, 51, 53, 57, 59, 63, 65
Chapter 2: Describing Location in a Distribution
8
Chapter 2: Review
Objectives: Students will be able to:
Summarize the chapter
Define the vocabulary used
Know and be able to discuss all sectional and chapter knowledge objectives
Be able to do all sectional and chapter construction objectives
Successfully answer any of the review exercises
Vocabulary: None new
Homework: pg 138 – 139; problems 2.1 – 2.13
Chapter 2: Describing Location in a Distribution
9
5 Minute Reviews
Chapter 1:
Given the following two UNICEF African school enrollment data sets:
Northern Africa: 34, 98, 88, 83, 77, 48, 95, 97, 54, 91, 94, 86, 96
Central Africa: 69, 65, 38, 98, 85, 79, 58, 43, 63, 61, 53, 61, 63
1. Describe each dataset
2. Compare the two datasets
Lesson 2-1a:
1. What does a z-score represent?
2. According to Chebyshev’s Inequality, how much of the data must lie within two standard deviations of any
distribution?
Given the following z-scores on the test: Charlie, 1.23; Tommy, -1.62; and Amy, 0.87
3. Who had the better test score?
4. Who was farthest away from the mean?
5. What does the IQR represent?
6. What percentile is the person who is ranked 12th
in a class of 98?
Lesson 2-1b:
1. Statistics are from ______ and parameters are from ________
2. In a uniform distribution everything is ______ likely.
3. If a distribution is skewed right, which is greater, the mean or the median and why?
4. The area under a density function is equal to ____
5. Name a common uniform probability example
6. Uniform probability distributions are of what types of quantitative variables?
Chapter 2: Describing Location in a Distribution
10
Lesson 2-2a:
1. What is the mean and standard deviation of Z?
Given the following distributions: A~N(4,1), B~N(10,4) C~N(6,8)
2. Which is the tallest?
3. Which is the widest?
4. The Empirical Rule is also known as the __ , __ , ___ rule.
5. Given P(z < a) = 0.251, find P(z > a)
6. In distribution B, what is the area to the left of 10?
Lesson 2-2b:
Given a normal distribution with = 4 and = 2, Find
1. P(x < 2)
2. P(x > 5)
3. P(1< x < 5)
4. x, if P(x) = 0.95
5. x, if P(x) = 0.05
Lesson 2-2c:
1. What is the shape of a normality plot if the distribution is approximately normal?
2. Do these normality plots indicate a normal distribution?
Chapter 2: Describing Location in a Distribution
11
Homework:
pg. 105-09, problems: 1, 5, 9, 11, 13, 15, 19, 31
1. a. Girl with 22 pairs: 5/20 are below so hers is 25th
percentile; b. Boy with 22 pairs: 17/20 are below so his is 85th
percentile; c. Boy is more unusual than the girl (25 closed to center than 85)
5. The girl weighs more than 48% of the others, but is taller than 78% (probably fairly skinny)
9. a. Q1 is about $19, Q3 is about $50. IQR is 50 – 31 = 19; b. about 26%
11. Eleanor. Her z-score is 1.8 ([680-500]/100) vs Gerald’s z = 1.5 ([27-18]/6)
13. a. Judy’s z-score is about -1.5 (about 1 and ½ standard deviations below the mean for her age). b. -1.45 = (948-956)/ so
= (948-956)/-1.5 = 5.52
15. a. Lidge’s salary was higher than 22 out of 29 on the Phillies, so 75.86 percentile; b. z = 0.79 = (6350000-
3388617)/3767484. His salary was 0.79 standard deviations above the mean
19. a. The measures of center (mean 87.188 and median 87.5) both increased by 18. Distribution shifts up by 18 inches; b.
Neither IQR or standard deviation changes (shape shifted up and did not change).
31. a. Mean is C and median is B; b. Mean and median are B
pg. 131-6, problems: 41, 43, 45, 47, 51, 53, 57, 59, 63, 65
41. Normal curve centered at 69 and goes between 61.5 and 76.5 (+/- 3 std devs 99.7% of data)
43. a. above 2 = apx 2.5% b. +/-2 or 64 to 74 c. apx 13.5% (-2 to -1) d. 84th
percentile (+1)
45. Tall one 6 = 1.2 (-0.6 to 0.6) so = 0.2; short one 6 = 3 (-1.5 to 1.5) so = 0.5
47. a. 0.9978 b. 0.0022 c. 0.9515 d. 0.9493 Use normalcdf on calculator (z =1 and μ=0)
51. a. invnorm(0.10,0,1) = -1.28 b. invnorm(1-0.34,0,1) = 0.41
53. a. normcdf(0,240,266,16) = 0.0521 b. normalcdf(240,270,266,16) = 0.5466 c. invnorm(0.80,266,16) = 279.47
57. a. z = -2.05 = (0.30 – μ)/0.04 μ = 0.382 b. -2.05 = (0.30 -0.37)/ = 0.034 c. Option A; increase mean gives
normalcdf(0.5,+,0.382,0.04) = 0.001589; Option B; decrease std dev give normalcdf(0.5,+,0.37,0.034) = 0.000065.
Option B is much better.
59. a. +/- 1.28 [invnorm(0.1,0,1) and invnorm(0.9,0,1)] b. 61.3 and 67.7 [invnorm(0.1,64.5,2.5) and invnorm(0.9, 64.5,2.5)]
63. a. Center about 15.58 feet (mean), std dev is 2.55 feet, shape is roughly symmetric with one possible high and low outlier
b. 68.2, 95.5 and 100% so very close to 68/95/99.7 rule c. Normality plot looks very normal (except for the two outliers)
65. Plot is nearly linear; so normality is a reasonable assumption. Around 125 small cluster is off and one low and two high
values are off
Chapter 2: Describing Location in a Distribution
12
pg 138 – 139; problems 2.1 – 2.13
2.1 e
2.2 d
2.3 b
2.4 b
2.5 a
2.6 d
2.7 c
2.8 e
2.9 e
2.10 c
2.11
a. Jane’s performance was better. She did more curl-ups than 85% of the girls (presidential level). Matt did more curl ups
than only 50% of the boys (national level). b. Jane’s z-score will be higher
2.12
a. The soldier’s head size is about 1 standard deviation above the mean (z = 1.0) and would be about 84 percentile
b. normalcdf(20,26,22.8,1.1) = 0.9927 so .007 or 0.7% would require a custom fit
c. Quartiles from a z-score are +/- 0.67 (from invnorm(0.25,0,1) and invnorm(0.75,0,1). Use z-score formula [z=(x-μ)/] to
solve for the x (head circumference) values 22.063 and 23.537. IQR = 23.537 – 22.063 = 1.474
or use invnorm(0.25,22.8,1.1) = 22.058 and invnorm(0.75,22.8,1.1) = 23.542 . IQR = 23.542 – 22.058 = 1.484
2.13
Since the mean is much higher than the median and there is at least one outlier to the high side, the distribution looks skewed
right and not symmetric (normal)
Chapter 2: Describing Location in a Distribution
13
Practice Test:
1. Scores on a test have mean 75 and standard deviation 6. The teacher is considering curving these grades.
(a) If the teacher adds 5 points to all individual grades, the class mean will be ___________ and the standard deviation
will be _____________.
(b) If the teacher increases all grades by 10%, the class mean will be ________________ and the standard deviation will
be _________________.
2. Suppose that the juice dispensers in a school dining hall are filled each morning with 20 gallons of juice. Records show
that a probability density function describing the daily juice consumption j is:
f j j j( ) . . 01 0005 0 20 if
A sketch of this density curve is provided in the space to the right.
(a) Verify that f is a valid density function.
(b) What proportion of days do students dispense more than 10 gallons of
juice?
(c) What proportion of days do students dispense between 5 and 10 gallons of juice?
Chapter 2: Describing Location in a Distribution
14
3. Scores on an intelligence test are normally distributed with mean 100 and standard deviation 15.
(a) What proportion of the population scores between 90 and 125?____________
(b) According to this test, how high must a person score to be in the top 25% of the population in intelligence?
_____________
4. The lifetime of a brand of television picture tubes is normally distributed with 8.4 years and 1.5 years. The
manufacturer guarantees to replace tubes that burn out prior to the time they specify in the guarantee.
(a) How long will the middle 95% of all tubes last?
(b) If the manufacturer guarantees the tubes for 5 years, what proportion of tubes will have to be replaced? _______
(c) If the firm is willing to replace the picture tubes in a maximum of 5% of the television sets sold, what is the maximum
guarantee period they should they offer? ________________________________
5. Packaging machines used to fill sugar bags can deliver any target weight with standard deviation of 2 ounces. If you want
to fill 5 pound bags so that only 2% are underweight, what target setting would you use for the machine?______________
Include an illustrative drawing and clearly organized work in the space below. Note: 5 pounds = 80 ounces..
Chapter 2: Describing Location in a Distribution
15
6. A study team was commissioned to compare the performance of two local hospitals. One of the factors they considered
was the survival record for surgical patients at each hospital. When the team first looked at the data, they clumped all
patients together as shown below:
All Patients
Survived Died % Died
Hospital A 2037 63 3.0%
Hospital B 784 16 2.0%
It appears from the table above that Hospital B has the lower fatality rate. The administrators of Hospital A were concerned.
They had been keeping similar records, but they had divided the patients into groups according to their condition when they
entered surgery. These data, shown in the tables below, indicate that Hospital A has lower fatality rates.
Patients in Good Condition Patients in Poor Condition
Survived Died % Died Survived Died % Died
Hospital A 594 6 1.0% Hospital A 1443 57 3.8%
Hospital B 592 8 1.33% Hospital B 192 8 4.0%
The members of the study team are confused. Write a few sentences to help them understand what is going on.
7. After the Challenger disaster, investigators examined data to determine what caused the problems that occurred on that
fateful day. Following are the temperatures (at time of launch) associated with shuttle launches in which there were O-ring
failures (prior to the Challenger launch): 54 54 54 57 58 63 70 75 75
Following are the temperatures (at time of launch) associated with shuttle launches in which there were not O-ring failures
(prior to the Challenger launch: 66 67 67 67 68 69 70 70 72 73
76 78 79 80 81
(a) Complete the table below:
Temperature
54 - 62 63 - 71 72 - 81
Failure
No Failure
(b) How many launches had occurred at the time these data were recorded? ________
(c) What percent of the launches had O-ring failures? ___________
(d) What percent of the O-ring failures occurred when launch temperatures were between 54 and 62 degrees? ________
(e) What percent of the launches that were made when temperatures were between 54 and 62 degrees resulted in O-ring
failures? _________
8. A company needs to buy several cars for its motor pool. Managers have specified that the new cars should get at least 30
miles per gallon (MPG) on the highway. Two models are being considered, and the gas mileage for both is normally
distributed. For one model the mean gasoline consumption is 35 mpg with a standard deviation of 3 mpg. The other model
averages 34 mpg with a standard deviation of 1.5 mpg. Which model should the managers choose? Support your answer
with an explanation based on what you have learned about normal distributions.
Chapter 2: Describing Location in a Distribution
16
Additional practice on standard Normal distribution:
Using Table A in your book find the percentage (area under the curve):
Z < 1.50 Z > 1.42
Z < -0.50 Z > -0.70
Z < -2.05 Z > 0.29
Using your calculator find the percentage (area under the curve):
Z < 1.50 Z > 1.42
Z < -0.50 Z > -0.70
Z < -2.05 Z > 0.29
Find the percentage (area under the curve):
Calc Table
0.50 < Z < 1.50 0.25 < Z < 1.42
-1.25 < Z < -0.30 -2.25 < Z < -0.90
0.8 < Z < 2.05 -0.38 < Z < 0.79