chapter 9 overview inferential...
TRANSCRIPT
M146 - Chapter 9 Handouts
Page 1 of 33
Chapter 9 Overview
Inferential Statistics – using to make inferences
about a population.
Estimation – sample data are used to estimate the value of unknown
parameters such as µ or p.
Hypothesis testing – specific statements regarding a characteristic of a
population are tested using sample data.
If you could survey every single member of a population (in other words, do a
census), then you could find out:
The true population mean, µ, of some value: for example the mean
amount of U.S. consumer spending per day.
The true population proportion, p, of some value: for example the
proportion of American adults who favor vaccines.
Goal: take a sample from a population, and use that
sample to predict the value of the parameter for the entire population.
However, the result obtained from any one sample will be
from the result obtained from the corresponding population, because samples
.
Therefore, we use to estimate the population
parameters. The single value that we get from the sample is referred to as a
, and then we add a margin of error E
around that point estimate to create an interval:
And we believe, with a certain confidence level, that the TRUE value of the
population parameter will be contained within that interval.
M146 - Chapter 9 Handouts
Page 2 of 33
Section 9.1 Estimating a Population Proportion
Example: Proportion of CBC Students who are > 25 years old
Population is ALL CBC Students
Our sample data: n = (sample size, F19/S18 combined)
x = (number of “successes”, students > 25)
p̂ =
(Assume that the sample was simple random, and is representative of the
population.)
Start by using the sample proportion to approximate the population proportion p:
p̂ is a , a single value used to
approximate a population parameter.
How good is the point estimate?
Is the actual population proportion p equal to the point estimate?
Take the point estimate and add to account for error:
Take the point estimate and add to account for error:
M146 - Chapter 9 Handouts
Page 3 of 33
Summarize Example:
Very informal method of coming up with a:
Confidence interval – a used to estimate
the true value of a population parameter.
Basically it’s the based on the available sample
data.
Based on what we just did, could say: “We are pretty confident that the true
proportion of all CBC students who are more than 25 years old is between
.”
Notice that the the interval got, the
we became that the true value of the population
mean was contained.
In general, estimating a population parameter means: find a confidence
interval, an interval of values, which
that true population value.
It does NOT mean: come up with a that
equals the true population value.
M146 - Chapter 9 Handouts
Page 4 of 33
Sampling Distribution of Proportions – CBC Students > 25 years old
From the Fall 2018 CBC student population data (note – normally you would
NOT have the population data available!), we know that:
N = (total number of students in population)
x = (number of students > 25 years old)
Therefore, can calculate p, the TRUE population proportion of students who are
older than 25 years:
p =
1 – p =
M146 - Chapter 9 Handouts
Page 5 of 33
Normal because:
Mean of the sample proportions =
Standard deviation of the sample proportions =
Sampling Distribution of Proportions, CBC Student Ages > 25, n = _____
p =
M146 - Chapter 9 Handouts
Page 6 of 33
Method for Finding the Confidence Interval
p =
p is the true population proportion (unknown), and the goal is to
in our confidence interval.
Method:
Choose a specific confidence level – 95% confidence level is common.
That means: select around the true proportion.
The distance from the true proportion p (center) to either edge of the selected
area is called the .
By definition, 95% of the p̂ values will lie in the interval:
Create a confidence interval around the sample proportion:
Therefore, when the observed sample proportion p̂ lies in the interval
(p - E, p + E), then the confidence interval ( p̂ - E, p̂ + E)
Sampling Distribution of Proportions, CBC Student Ages > 25, n = _____
Normal Distribution
=
M146 - Chapter 9 Handouts
Page 7 of 33
Calculate the Margin of Error, E
E =
E is the distance from the center of the sampling distribution (the true proportion)
out to the edge of the area that we want to include.
The confidence level tells us how much area we want to include.
So, 95% confidence means: include of the area.
(alpha) is the complement of the confidence, so:
= Confidence =
Specifically, for 95% confidence: =
the z value in the error formula is referred to as: z/2 Why?
z/2 is called the because it separates likely
values (center) from unlikely values (tails).
What IS z/2 for this confidence level?
Area in the middle =
TOTAL area in the two tails =
Area for EACH tail =
M146 - Chapter 9 Handouts
Page 8 of 33
Notice that there is a different z/2 for each different confidence level!
Connection to the Empirical Rule:
M146 - Chapter 9 Handouts
Page 9 of 33
95% Confidence Interval (CI) for Proportions, CBC Student Ages > 25
95% confidence means: z/2 =
From our sample:
Sample size n =
Sample proportion p̂ =
1 - p̂ =
Calculate: E =
Find the confidence interval bounds, the extreme values at either end:
Lower bound = p̂ - E =
Upper bound = p̂ + E =
In general, round off the confidence interval bounds for proportions to three
significant digits.
Interpreting the Confidence Interval
It means: We are 95% confident that the true population proportion of CBC
students who are more than 25 years old is between .
If we were to do this sampling procedure many times (with n = _______), 95% of
the time the CI will contain the true population proportion.
M146 - Chapter 9 Handouts
Page 10 of 33
Confidence Intervals for a Proportion Using Technology
In STATDISK:
Analysis/Confidence Interval/Proportion One Sample
Precision of the Estimate
Notice that our estimate is very precise! In other words, the
interval is very . The precision of the estimate relies on a
couple of factors:
1. The .
2. The .
M146 - Chapter 9 Handouts
Page 11 of 33
Effect of Higher Confidence Level
What’s going to happen to the Confidence Interval if we change from 95%
confidence to 99% confidence?
99% confidence means: z/2 =
Calculate: E =
Lower bound = p̂ - E =
Upper bound = p̂ + E =
It means: We are 99% confident that the true population proportion of CBC
students who are more than 25 years old is between .
KEY: The higher level of confidence results in a
confidence interval.
Because of this, we have reduced the of the
estimate.
M146 - Chapter 9 Handouts
Page 12 of 33
Effect of Bigger Sample Size
Bigger sample: n = (random sample from population data)
x = (number of “successes”, students > 25)
p̂ =
1 - p̂ =
z/2 = 1.96 (assume 95% confidence)
Calculate margin of error: E =
KEY: This margin of error is much than the 95%
margin of error we had for sample size n = . (E = )
Because:
The larger sample provides information
Therefore, there is associated with it.
Notice that the point estimate for n = was closer to the true
population proportion p than the point estimate for n = ,
which is expected.
Find the confidence interval:
It means: We are 95% confident that the true population proportion of CBC
students who are more than 25 years old is between .
M146 - Chapter 9 Handouts
Page 13 of 33
Sampling Distribution of Proportions, CBC Student Ages > 25, n =
The sampling distribution itself looks different: it is
.
The larger sample results in a much confidence interval.
M146 - Chapter 9 Handouts
Page 14 of 33
Confidence Interval for the Population Proportion
Instructions:
Everyone has a cup with 100 beans in it, a mixture of black and white beans. You are
going to take a sample of beans, and use the sample to estimate the true population
proportion of white beans.
1. Select 40 beans, one at a time. You are sampling WITH replacement, so put the bean
back in the cup each time. Record the color of each bean that you select. Each team just
does one sampling experiment together.
White Black
Total white = Total black =
2. A “success” is being defined as getting a white bean.
Calculate your sample proportion of white beans: p̂ = _______
3. Use your data to construct an 80% confidence interval for the population proportion.
4. Use your data to construct a 95% confidence interval for the population proportion.
Once you have completed 3. and 4. above, come up front and sketch your confidence
intervals on the viewgraphs for the 80% and 95% confidence intervals!! Then look
at the questions on the next page.
M146 - Chapter 9 Handouts
Page 15 of 33
Questions:
1. For the 80% confidence interval:
(a) How many of the total confidence intervals include the true value of the population
proportion of white beans, p = _____? What percent is this? (can’t answer this one until
we have ALL the data compiled on viewgraph)
(b) What % of the confidence intervals do you expect to include the true value of the
population proportion, p?
(c) Explain how to interpret a level of confidence of 80%.
2. For the 95% confidence interval:
(a) How many of the total confidence intervals include the true value of the population
proportion of white beans, p = _____? What percent is this? (can’t answer this one until
we have ALL the data compiled on viewgraph)
(b) What % of the confidence intervals do you expect to include the true value of the
population proportion, p?
(c) Explain how to interpret a level of confidence of 95%.
3. Explain how increasing the confidence level from 80% to 95% changed the
confidence intervals.
4. Why did we have to take a sample of n = 40 beans to construct our confidence
intervals, instead of n = 15 or n = 20 beans? (hint: consider the requirements to have a
normal sampling distribution).
M146 - Chapter 9 Handouts
Page 16 of 33
Determining Sample Size for Proportions
Let’s say we are going to do a survey or conduct some procedure specifically to
estimate a population proportion. How many samples are needed to meet a
certain confidence level (i.e. 95%)?
Two scenarios:
1. Already have estimate of p̂ (i.e. from previous samples or other source).
2. Do not have any available estimate of p̂ .
Case 1: already have estimate of p̂
Take the formula for margin of error E, and solve it for n:
E = n
ppz
)ˆ1(ˆ2/
Case 2: have no estimate of p̂
Use the same exact formula for n that we just derived. Since we have no idea
what p̂ is, we put in
p̂ =
1 – p̂ =
Simplify:
n =
(Why do we use these two values for p̂ and 1 – p̂ ?)
When you use EITHER of these formulas: round up to the nearest integer.
M146 - Chapter 9 Handouts
Page 17 of 33
Example: What proportion of CBC students have tattoos?
How many students do we need to sample if we want to be 95% confident that
the sample percentage is within 5 percentage points of the true population
percentage?
Case 1: Use our sample data as an estimate of p̂ .
From our survey the first night of class:
n =
x = “success” = yes, have a tattoo =
p̂ =
1 – p̂ =
Case 2: Assume we have no estimate of p̂ (how will the sample size compare?)
Either way it’s a pretty large sample size. What could I change if I don’t want to
take so many samples?
M146 - Chapter 9 Handouts
Page 18 of 33
Sample Size for a Proportion Using Technology
In STATDISK:
Analysis/Sample Size Determination/Estimate Proportion
M146 - Chapter 9 Handouts
Page 19 of 33
Summary of Methods for Estimating a Population Proportion
M146 - Chapter 9 Handouts
Page 20 of 33
From Tri-City Herald, Nov. 2007: Few days after article:
…rest of article omitted, until the very end:
M146 - Chapter 9 Handouts
Page 21 of 33
Few days after letter
from Larry Bafus:
M146 - Chapter 9 Handouts
Page 22 of 33
c. What size sample should be obtained to ensure a margin of
error of at most 1% for a 95% confidence interval, using the
given sample data as a known estimate.
M146 - Chapter 9 Handouts
Page 23 of 33
Section 9-2 Estimating a Population Mean
As with the proportions, we want to estimate the value of a population mean, µ:
Using sample data, in this case the sample mean as the best
point estimate of the population mean,
Determine a Confidence Interval that will give us a range of values for the
mean, as opposed to just relying on the single point estimate.
Do this by calculating a margin of error E and applying it on both sides of
the point estimate:
To determine the margin of error, E, we need to describe the sampling
distribution of the mean, just like we did with proportions:
The distribution of x will be approximately if the
population is normal or if the sample size is large.
The mean of the sampling distribution will equal the mean of the
population, or .
The standard deviation of the sampling distribution will equal:
We want to calculate the margin of error, just like we did for proportions:
E =
Here is the problem: we do not know !
Have to estimate it using .
Because of this, and based on the statistical methods, do not use the
distribution to find the critical value z/2. Instead, use a different distribution
called the .
Margin of Error: E =
M146 - Chapter 9 Handouts
Page 24 of 33
What the t-curve looks like:
It looks like a normal curve, but it is !
The t-curve has a completely different mathematical model:
Characteristics of t distribution
It has the same general as the standard normal z
distribution.
It is symmetric about
The total area under the t curve =
It has the same behavior, the curve
approaches, but never touches the horizontal axis.
M146 - Chapter 9 Handouts
Page 25 of 33
Differences from the Standard Normal Curve (z distribution)
The t distribution is with heavier or fatter
on either side, because it has more probability in the tails than the z
distribution does.
The t distribution varies with the : as
the sample size n gets larger, the t distribution gets
to the z distribution.
Standard deviation of t distribution also varies with sample size, but is
always (vs. z dist. where standard deviation always ).
Because of this, the t distribution has more in it, the
standard deviation is .
Reasons for using the t distribution instead of the z distribution:
In general when using this method:
Have higher level of because we don’t know
Have to rely on the standard deviation from sample.
t distribution compensates for this higher unreliability by making the
confidence interval .
To make the confidence interval wider, we have to have a larger value for
the .
Therefore, critical values of t/2 are critical values of z/2
M146 - Chapter 9 Handouts
Page 26 of 33
How to Look Up Critical ‘t’ Values in Table VII
Depends on two values:
1. Degrees of =
2. Area in the tail
Based on the confidence level, you have to determine how much area is in the
right tail!
Example: for n = 10, and a 99% confidence level, t/2 = ______ 1. df =
2. Area in right tail =
Example: Demonstrates: 1. t > z
2. t z as sample size gets bigger
Confidence level z/2 t/2 (n=6) t/2 (n=21) t/2 (n=101)
95
M146 - Chapter 9 Handouts
Page 27 of 33
Summary of Methods for Estimating a Population Mean
Solve the E equation for n to determine the sample size necessary:
Because we cannot determine 𝑡∝/2 without knowing the sample size (df),
substitute the critical value of 𝑧∝/2 into the equation:
M146 - Chapter 9 Handouts
Page 28 of 33
Example: Sample of CBC student ages, Linda’s F19 M146 class
n =
x =
s =
Use this sample to construct a 95% CI estimate of the mean age in years for all
CBC students. (Assume that the sample was simple random, and is
representative of the population.)
df = t/2 =
E =
lower bound = x – E =
upper bound = x + E =
Confidence interval is:
M146 - Chapter 9 Handouts
Page 29 of 33
Interpreting the Results:
“We are 95% confident that the true value of the population mean lies within the
interval .
If we repeatedly took RANDOM samples of students and constructed
the 95% CI for each, then in the long run, about of them would
contain the true mean.
Sampling Distribution of the Means, CBC Student Ages, n = _____
M146 - Chapter 9 Handouts
Page 30 of 33
Confidence Intervals for a Mean Using Technology
In STATDISK:
Analysis/Confidence Interval/Mean-One Sample
M146 - Chapter 9 Handouts
Page 31 of 33
Determining Sample Size Needed to Estimate Mean
Example: Sample size for CBC student ages
Say we want to be 95% confident that the sample mean is within 2 years of the
true population mean. How many students must be randomly selected?
z/2 =
s = years (from our class sample data)
E =
Calculate: n =
Remember – always round UP to the next integer.
KEY: equation for sample size (for either a proportion or mean) does NOT
depend on size of .
M146 - Chapter 9 Handouts
Page 32 of 33
One data value that I collect from my M146 students on the survey each quarter
is how many hours they are willing to devote to school work per week. The data
that I have collected over a several quarters suggests that the population is
approximately normally distributed.
The following data is a random sample of 14 values representing how many
hours my students are willing to spend on school work each week. Construct
and interpret a 90% confidence interval for the mean number of hours that
CBC students are willing to spend on school work each week. (Assume that
the sample was random and representative). The sample standard deviation is
s = 9.364417 hours.
4 10 15 25 30
9 14 20 25 35
10 15 24 30
How many students should I survey to estimate the number of hours they are
willing to devote to school work each week within 2 hours at 90% confidence?
M146 - Chapter 9 Handouts
Page 33 of 33