chapter 9 overview inferential...

M146 - Chapter 9 Handouts

of 33

Chapter 9 Overview

Inferential Statistics – using to make inferences

about a population.

Estimation – sample data are used to estimate the value of unknown

parameters such as µ or p.

Hypothesis testing – specific statements regarding a characteristic of a

population are tested using sample data.

If you could survey every single member of a population (in other words, do a

census), then you could find out:

The true population mean, µ, of some value: for example the mean

amount of U.S. consumer spending per day.

The true population proportion, p, of some value: for example the

proportion of American adults who favor vaccines.

Goal: take a sample from a population, and use that

sample to predict the value of the parameter for the entire population.

However, the result obtained from any one sample will be

from the result obtained from the corresponding population, because samples

.

Therefore, we use to estimate the population

parameters. The single value that we get from the sample is referred to as a

, and then we add a margin of error E

around that point estimate to create an interval:

And we believe, with a certain confidence level, that the TRUE value of the

population parameter will be contained within that interval.


of 33

Section 9.1 Estimating a Population Proportion

Example: Proportion of CBC Students who are > 25 years old

Population is ALL CBC Students

Our sample data: n = (sample size, F19/S18 combined)

x = (number of “successes”, students > 25)

p̂ =

(Assume that the sample was simple random, and is representative of the

population.)

Start by using the sample proportion to approximate the population proportion p:

p̂ is a , a single value used to

approximate a population parameter.

How good is the point estimate?

Is the actual population proportion p equal to the point estimate?

Take the point estimate and add to account for error:

Take the point estimate and add to account for error:


of 33

Summarize Example:

Very informal method of coming up with a:

Confidence interval – a used to estimate

the true value of a population parameter.

Basically it’s the based on the available sample

data.

Based on what we just did, could say: “We are pretty confident that the true

proportion of all CBC students who are more than 25 years old is between

.”

Notice that the the interval got, the

we became that the true value of the population

mean was contained.

In general, estimating a population parameter means: find a confidence

interval, an interval of values, which

that true population value.

It does NOT mean: come up with a that

equals the true population value.


of 33

Sampling Distribution of Proportions – CBC Students > 25 years old

From the Fall 2018 CBC student population data (note – normally you would

NOT have the population data available!), we know that:

N = (total number of students in population)

x = (number of students > 25 years old)

Therefore, can calculate p, the TRUE population proportion of students who are

older than 25 years:

p =

1 – p =


of 33

Normal because:

Mean of the sample proportions =

Standard deviation of the sample proportions =

Sampling Distribution of Proportions, CBC Student Ages > 25, n = _____

p =


of 33

Method for Finding the Confidence Interval

p =

p is the true population proportion (unknown), and the goal is to

in our confidence interval.

Method:

Choose a specific confidence level – 95% confidence level is common.

That means: select around the true proportion.

The distance from the true proportion p (center) to either edge of the selected

area is called the .

By definition, 95% of the p̂ values will lie in the interval:

Create a confidence interval around the sample proportion:

Therefore, when the observed sample proportion p̂ lies in the interval

(p - E, p + E), then the confidence interval ( p̂ - E, p̂ + E)

Sampling Distribution of Proportions, CBC Student Ages > 25, n = _____

Normal Distribution

=


of 33

Calculate the Margin of Error, E

E =

E is the distance from the center of the sampling distribution (the true proportion)

out to the edge of the area that we want to include.

The confidence level tells us how much area we want to include.

So, 95% confidence means: include of the area.

(alpha) is the complement of the confidence, so:

= Confidence =

Specifically, for 95% confidence: =

the z value in the error formula is referred to as: z/2 Why?

z/2 is called the because it separates likely

values (center) from unlikely values (tails).

What IS z/2 for this confidence level?

Area in the middle =

TOTAL area in the two tails =

Area for EACH tail =


of 33

Notice that there is a different z/2 for each different confidence level!

Connection to the Empirical Rule:


of 33

95% Confidence Interval (CI) for Proportions, CBC Student Ages > 25

95% confidence means: z/2 =

From our sample:

Sample size n =

Sample proportion p̂ =

1 - p̂ =

Calculate: E =

Find the confidence interval bounds, the extreme values at either end:

Lower bound = p̂ - E =

Upper bound = p̂ + E =

In general, round off the confidence interval bounds for proportions to three

significant digits.

Interpreting the Confidence Interval

It means: We are 95% confident that the true population proportion of CBC

students who are more than 25 years old is between .

If we were to do this sampling procedure many times (with n = _______), 95% of

the time the CI will contain the true population proportion.


of 33

Confidence Intervals for a Proportion Using Technology

In STATDISK:

Analysis/Confidence Interval/Proportion One Sample

Precision of the Estimate

Notice that our estimate is very precise! In other words, the

interval is very . The precision of the estimate relies on a

couple of factors:

1. The .

2. The .


of 33

Effect of Higher Confidence Level

What’s going to happen to the Confidence Interval if we change from 95%

confidence to 99% confidence?

99% confidence means: z/2 =

Calculate: E =

Lower bound = p̂ - E =

Upper bound = p̂ + E =



KEY: The higher level of confidence results in a

confidence interval.

Because of this, we have reduced the of the

estimate.


of 33

Effect of Bigger Sample Size

Bigger sample: n = (random sample from population data)

x = (number of “successes”, students > 25)

p̂ =

1 - p̂ =

z/2 = 1.96 (assume 95% confidence)

Calculate margin of error: E =

KEY: This margin of error is much than the 95%

margin of error we had for sample size n = . (E = )

Because:

The larger sample provides information

Therefore, there is associated with it.

Notice that the point estimate for n = was closer to the true

population proportion p than the point estimate for n = ,

which is expected.

Find the confidence interval:




of 33

Sampling Distribution of Proportions, CBC Student Ages > 25, n =

The sampling distribution itself looks different: it is

.

The larger sample results in a much confidence interval.


of 33

Confidence Interval for the Population Proportion

Instructions:

Everyone has a cup with 100 beans in it, a mixture of black and white beans. You are

going to take a sample of beans, and use the sample to estimate the true population

proportion of white beans.

1. Select 40 beans, one at a time. You are sampling WITH replacement, so put the bean

back in the cup each time. Record the color of each bean that you select. Each team just

does one sampling experiment together.

White Black

Total white = Total black =

2. A “success” is being defined as getting a white bean.

Calculate your sample proportion of white beans: p̂ = _______

3. Use your data to construct an 80% confidence interval for the population proportion.

4. Use your data to construct a 95% confidence interval for the population proportion.

Once you have completed 3. and 4. above, come up front and sketch your confidence

intervals on the viewgraphs for the 80% and 95% confidence intervals!! Then look

at the questions on the next page.


of 33

Questions:

1. For the 80% confidence interval:

(a) How many of the total confidence intervals include the true value of the population

proportion of white beans, p = _____? What percent is this? (can’t answer this one until

we have ALL the data compiled on viewgraph)

(b) What % of the confidence intervals do you expect to include the true value of the

population proportion, p?

(c) Explain how to interpret a level of confidence of 80%.

2. For the 95% confidence interval:

(a) How many of the total confidence intervals include the true value of the population

proportion of white beans, p = _____? What percent is this? (can’t answer this one until

we have ALL the data compiled on viewgraph)

(b) What % of the confidence intervals do you expect to include the true value of the

population proportion, p?

(c) Explain how to interpret a level of confidence of 95%.

3. Explain how increasing the confidence level from 80% to 95% changed the

confidence intervals.

4. Why did we have to take a sample of n = 40 beans to construct our confidence

intervals, instead of n = 15 or n = 20 beans? (hint: consider the requirements to have a

normal sampling distribution).


of 33

Determining Sample Size for Proportions

Let’s say we are going to do a survey or conduct some procedure specifically to

estimate a population proportion. How many samples are needed to meet a

certain confidence level (i.e. 95%)?

Two scenarios:

1. Already have estimate of p̂ (i.e. from previous samples or other source).

2. Do not have any available estimate of p̂ .

Case 1: already have estimate of p̂

Take the formula for margin of error E, and solve it for n:

E = n

ppz

)ˆ1(ˆ2/

Case 2: have no estimate of p̂

Use the same exact formula for n that we just derived. Since we have no idea

what p̂ is, we put in

p̂ =

1 – p̂ =

Simplify:

n =

(Why do we use these two values for p̂ and 1 – p̂ ?)

When you use EITHER of these formulas: round up to the nearest integer.


of 33

Example: What proportion of CBC students have tattoos?

How many students do we need to sample if we want to be 95% confident that

the sample percentage is within 5 percentage points of the true population

percentage?

Case 1: Use our sample data as an estimate of p̂ .

From our survey the first night of class:

n =

x = “success” = yes, have a tattoo =

p̂ =

1 – p̂ =

Case 2: Assume we have no estimate of p̂ (how will the sample size compare?)

Either way it’s a pretty large sample size. What could I change if I don’t want to

take so many samples?


of 33

Sample Size for a Proportion Using Technology

In STATDISK:

Analysis/Sample Size Determination/Estimate Proportion


of 33

Summary of Methods for Estimating a Population Proportion


of 33

From Tri-City Herald, Nov. 2007: Few days after article:

…rest of article omitted, until the very end:


of 33

Few days after letter

from Larry Bafus:


of 33

c. What size sample should be obtained to ensure a margin of

error of at most 1% for a 95% confidence interval, using the

given sample data as a known estimate.


of 33

Section 9-2 Estimating a Population Mean

As with the proportions, we want to estimate the value of a population mean, µ:

Using sample data, in this case the sample mean as the best

point estimate of the population mean,

Determine a Confidence Interval that will give us a range of values for the

mean, as opposed to just relying on the single point estimate.

Do this by calculating a margin of error E and applying it on both sides of

the point estimate:

To determine the margin of error, E, we need to describe the sampling

distribution of the mean, just like we did with proportions:

The distribution of x will be approximately if the

population is normal or if the sample size is large.

The mean of the sampling distribution will equal the mean of the

population, or .

The standard deviation of the sampling distribution will equal:

We want to calculate the margin of error, just like we did for proportions:

E =

Here is the problem: we do not know !

Have to estimate it using .

Because of this, and based on the statistical methods, do not use the

distribution to find the critical value z/2. Instead, use a different distribution

called the .

Margin of Error: E =


of 33

What the t-curve looks like:

It looks like a normal curve, but it is !

The t-curve has a completely different mathematical model:

Characteristics of t distribution

It has the same general as the standard normal z

distribution.

It is symmetric about

The total area under the t curve =

It has the same behavior, the curve

approaches, but never touches the horizontal axis.


of 33

Differences from the Standard Normal Curve (z distribution)

The t distribution is with heavier or fatter

on either side, because it has more probability in the tails than the z

distribution does.

The t distribution varies with the : as

the sample size n gets larger, the t distribution gets

to the z distribution.

Standard deviation of t distribution also varies with sample size, but is

always (vs. z dist. where standard deviation always ).

Because of this, the t distribution has more in it, the

standard deviation is .

Reasons for using the t distribution instead of the z distribution:

In general when using this method:

Have higher level of because we don’t know

Have to rely on the standard deviation from sample.

t distribution compensates for this higher unreliability by making the

confidence interval .

To make the confidence interval wider, we have to have a larger value for

the .

Therefore, critical values of t/2 are critical values of z/2


of 33

How to Look Up Critical ‘t’ Values in Table VII

Depends on two values:

1. Degrees of =

2. Area in the tail

Based on the confidence level, you have to determine how much area is in the

right tail!

Example: for n = 10, and a 99% confidence level, t/2 = ______ 1. df =

2. Area in right tail =

Example: Demonstrates: 1. t > z

2. t z as sample size gets bigger

Confidence level z/2 t/2 (n=6) t/2 (n=21) t/2 (n=101)

95


of 33

Summary of Methods for Estimating a Population Mean

Solve the E equation for n to determine the sample size necessary:

Because we cannot determine 𝑡∝/2 without knowing the sample size (df),

substitute the critical value of 𝑧∝/2 into the equation:


of 33

Example: Sample of CBC student ages, Linda’s F19 M146 class

n =

x =

s =

Use this sample to construct a 95% CI estimate of the mean age in years for all

CBC students. (Assume that the sample was simple random, and is

representative of the population.)

df = t/2 =

E =

lower bound = x – E =

upper bound = x + E =

Confidence interval is:


of 33

Interpreting the Results:

“We are 95% confident that the true value of the population mean lies within the

interval .

If we repeatedly took RANDOM samples of students and constructed

the 95% CI for each, then in the long run, about of them would

contain the true mean.

Sampling Distribution of the Means, CBC Student Ages, n = _____


of 33

Confidence Intervals for a Mean Using Technology

In STATDISK:

Analysis/Confidence Interval/Mean-One Sample


of 33

Determining Sample Size Needed to Estimate Mean

Example: Sample size for CBC student ages

Say we want to be 95% confident that the sample mean is within 2 years of the

true population mean. How many students must be randomly selected?

z/2 =

s = years (from our class sample data)

E =

Calculate: n =

Remember – always round UP to the next integer.

KEY: equation for sample size (for either a proportion or mean) does NOT

depend on size of .


of 33

One data value that I collect from my M146 students on the survey each quarter

is how many hours they are willing to devote to school work per week. The data

that I have collected over a several quarters suggests that the population is

approximately normally distributed.

The following data is a random sample of 14 values representing how many

hours my students are willing to spend on school work each week. Construct

and interpret a 90% confidence interval for the mean number of hours that

CBC students are willing to spend on school work each week. (Assume that

the sample was random and representative). The sample standard deviation is

s = 9.364417 hours.

4 10 15 25 30

9 14 20 25 35

10 15 24 30

How many students should I survey to estimate the number of hours they are

willing to devote to school work each week within 2 hours at 90% confidence?


of 33

chapter 9 overview inferential...

Documents