statistical methods 1

Statistical Methods 1Lecture Notes

Chapter 7: Inferences Based on a Single Sample: Estimation with Confidence Intervals

A survey by the Roper Organization found that 45% of the people who were offended by a television program would change the channel, while 15% would turn off their television sets. The survey further stated that the margin of error is 3 percentage points and 4000 adults were introduced.

1. How do these estimates compare with the true population percentages?

2. Is the sample of 4000 large enough to represent the population of all adults who watch television in the United States.

Introductory Example

1. Find the confidence interval for the mean when σ is known or n 30.

2. Determine the minimum sample size for finding a confidence interval of the mean

3. Find the confidence interval for the mean when σ is unknown and n< 30.

4. Find the confidence interval for a proportion

5. Determine the minimum sample size for finding a confidence interval of the proportion

objectives

Target Parameter

7.1 Identifying and Estimating the Target Parameter

Key words associated with parameters:

Parameter Key Words or Phrases

Mean; average

p Proportion; percentage; fraction; rate

Recall that a statistic such as the sample mean is a point estimator of the population mean .

Confidence Interval

The goal is to determine how to estimate the population mean and assess the estimate’s reliability.

Example: Find such that .

Solution:

7.2 Confidence Interval for a Population Mean: Normal (z) Statistic

According to the Central Limit Theorem, the distribution of will be approximately a normal and 95% of all from a sample of size n lie within

of the mean. There is a probability of .95 that will lie in the

interval .

The interval is called a large-sample 95% confidence interval for the population mean .

Large sample means the sample must be large enough so that the Central Limit Theorem can be applied. (Rule of thumb, n ≥ 30)

Example (cont.)

Calculating knowing , . We usually don’t know , but with a large sample s is a good estimator of .

The interval being called the 95% confidence

interval for the population mean means that if a large number of samples were taken and this interval calculated each time, 95% would contain .

The probability, .95, that measures the confidence we

can place in the interval estimate is called a confidence coefficient. The percentage, 95%, is called the confidence level for the interval estimate.

The Margin of error= (This is the maximum error of estimate for a 95% confidence interval)

Example (cont.)

Confidence coefficient – probability that a randomly selected confidence interval encloses the population parameter

Confidence level – Confidence coefficient expressed as a percentage

Definition

The confidence coefficient is equal to 1- , where is called the significance level and it is the amount of area assigned to the tails of the sampling distribution, and is split between the two tails of the distribution.

The confidence coefficient

John says he is 90% confident that the population mean is contained within the interval I when the values of the population are normally distributed.

Example: 90% confidence interval

A publishing company has just published has just a new textbook. Before the company decides the price at which to sell this textbook, it wants to know the average price of all such textbooks in the market. The research department at the company took a sample of 36 comparable textbooks and collected information on their prices. This information produces a mean of $70.50 for this sample. It is known that the standard deviation of the prices of all such textbooks is $4.50.

A) What is the point estimate of the mean price of all such college textbooks?

B) What is the margin error of this estimate?

C) Construct the 90% confidence interval for the mean price of all such college textbooks.

Example:

Large-Sample 100(1 - )% Confidence Interval for

Conditions required for a Valid Large-Sample Confidence Interval for

1. A random sample is selected from the target population.

2. The sample size n is large, n 30.

p. 306 # 11

A random sample of 100 observations from a normally distributed population possesses a mean equal to 83.2 and standard deviation equal to 6.4.

a. Find a 95% confidence interval for μ. b. What do you mean when you say that a

confidence coefficient is .95? c. Find a 99% confidence interval for μ. d. What happens to the width of a confidence

interval as the value of the confidence coefficient is increased while the sample size is held fixed?

e. Would your confidence intervals of parts a and c be valid if the distribution of the original population were not normal? Explain.

Example: p. 306 #11.

Interpretation

Many times, inferences must be made from small samples. But The Central Limit Theorem does not guarantee that sampling distribution of will be normal for small sample sizes. The sampling distribution of will be normal if the population is normal.

Also, the population standard deviation may not be known and the sample standard deviation s may not provide a good approximation for . Instead of using the statistic , which requires a good approximation of , the statistic

is used.

7.3 Confidence Interval for a Population Mean: Student's t-Statistic

The t-statistic is very much like the z-statistic. It is mound shaped, symmetric, and has mean 0. The t-statistic is different in that it has two random quantities ( and s) while the z-statistic only has one ( ).

The variability of t depends on the sample size n. Variability is expressed as (n-1) degrees of freedom (df). As df gets smaller, variability increases.

t-statistic vs z-statistic

t-statistic vs z-statistic

Table for t-distribution contains t-value for various combinations of degrees of freedom and tα. tα is the point where the upper tail of the t-distribution contains an area of .

The last row, where df = , contains the standard normal z-values.

Table for t-distribution: Table IV

Small-Sample 100(1 - )% Confidence Interval for

Conditions required for a Valid Large-Sample Confidence Interval for

1. A random sample is selected from the target population.

2. The population has a relative frequency distribution that is approximately normal.

The data below represent a sample of the number of homes fires started by candles for the past several years. Find the 99% confidence interval for the mean of homes started by candle each year.

Example:

5460 5900 6090 6310 7160 8440 9930

p. 317 # 33ab,39

The following random sample was selected from a normal distribution: 4, 6, 3, 5, 9, 3.

a. Construct a 90% confidence interval for the population mean μ.

b. Construct a 95% confidence interval for the population mean μ.

c. Construct a 99% confidence interval for the population mean μ.

d. Assume that the sample mean and sample standard deviation s remain exactly the same as those you just calculated, but that they are based on a sample of n = 25 observations rather than n = 6 observations. Repeat parts a-c . What is the effect of increasing the sample size on the width of the confidence intervals?

Example: p. 317 #33

Example: p. 318 #39 Radioactive lichen. Refer to the Lichen

Radionuclide Baseline Research project at the University of Alaska, presented in Exercise 2.36 (p. 47). Recall that the researchers collected 9 lichen specimens and measured the amount (in microcuries per milliliter) of the radioactive element cesium-137 for each. (The natural logarithms of the data values are saved in the LICHEN file.) A MINITAB printout with summary statistics for the actual data is shown below.

Variable N Mean StDev SE Mean 95% CI

CESIUM 9 0.009027 0.004854 0.001618 (0.005296, 0.012759)

a. Give a point estimate for the mean amount of cesium in lichen specimens collected in Alaska.

b. Give the t-value used in a small-sample 95% confidence interval for the true mean amount of cesium in Alaskan lichen specimens.

c. Use the result you obtained in part b and the values of and s shown on the MINITAB printout to form a 95% confidence interval for the true mean amount of cesium in Alaskan lichen specimens.

d. Check the interval you found in part c with the 95% confidence interval shown on the MINITAB printout.

e. Give a practical interpretation for the interval you obtained in part c .

Confidence intervals around a proportion are confidence intervals around the probability of success in a binomial experiment.

Sample statistic of interest is , where .

7.4 Large-Sample Confidence Interval for a Population Proportion

Sampling Distribution of

Large Sample Confidence Interval of p.

Conditions required for a Valid Large-Sample Confidence Interval of p

1. A random sample is selected from the target population.2. The sample size n is large. (A sample size is considered

large if both and . and are the number of success and the number of failures in the sample.)

A sample of 500 nursing applications included 60 from men. Find the 90% confidence interval of the true proportion of men who applied to the nursing program.

Example

p. 325 # 51,55

Example: p. 325#51

A random sample of size n = 196 yielded = .64. a. Is the sample size large enough to use

the methods of this section to construct a confidence interval for p? Explain.

b. Construct a 95% confidence interval for p.

c. Interpret the 95% confidence interval. d. Explain what is meant by the phrase

“95% confidence interval.”

Example: p. 325#55Is Starbucks coffee overpriced? The Minneapolis Star Tribune (August 12, 2008) reported that 73% of Americans say that Starbucks coffee is overpriced. The source of this information was a national telephone survey of 1,000 American adults conducted by Rasmussen Reports.

a. Identify the population of interest in this study. b. Identify the sample for the study. c. Identify the parameter of interest in the study. d. Find and interpret a 95% confidence interval for

the parameter of interest.

A. Large-Sample Confidence Interval for a Population Mean For random samples of size 30, the confidence interval is expressed as

B. Small-Sample Confidence Interval for a Population MeanThe small sample confidence interval will be

Where is based on (n – 1) degrees of freedom.

C. Large-Sample Confidence Interval for a Population ProportionLarge-Sample Confidence Interval for p

Where and .

A sample size is considered large if both and .

Summary of Confidence Intervals

A. Estimating a Population Mean The width of a confidence interval depends on the sample size:

As the sample size increases, the width of the interval decreases for any given confidence coefficient.

When we want to estimate to within a given number of units with a (1- )

level of confidence, we can calculate the sample size needed by solving the equation

= interval width for n.

7.5 Determining the Sample Size

The reliability associated with a confidence interval for the population mean is expressed using the sampling error within which we want to estimate with 100(1- )% confidence.

Sampling Error

Example: p. 332#71

If you wish to estimate a population mean to within .2 with a 95% confidence interval and you know from previous sampling that σ2 is approximately equal to 5.4, how many observations would you have to include in your sample?

The Sampling Error (SE) is half the width of the confidence interval.

sampling error SE of a confidence interval for a population proportion p

#71 p.

B. Estimating a Population Proportion

#78, 81 p.

Example: p. 332#81Scanning errors at Wal-Mart. Refer to the National Institute for Standards and Technology (NIST) study of the accuracy of checkout scanners at Wal-Mart stores in California, presented in Exercise 3.52 (p. 132). NIST sets standards so that no more than 2 of every 100 items scanned through an electronic checkout scanner can have an inaccurate price. Recall that in a sample of 60 Wal-Mart stores, 52 violated the NIST scanner accuracy standard (Tampa Tribune, Nov. 22, 2005). Suppose you want to estimate the true proportion of Wal-Mart stores in California that violate the NIST standard.

a. Explain why the large-sample methodology of Section 7.4 is inappropriate for this study.

b. Determine the number of Wal-Mart stores that must be sampled in order to estimate the true proportion to within .05 with 90% confidence, using the large-sample method.

statistical methods 1

Documents