stats chapter 7: sampling distributions - kevin · pdf filestats chapter 7: sampling ... also...

12
1 STATS Chapter 7: Sampling Distributions Section 7.1: Sampling Distribution Terms: Parameter: A number that describes an aspect of a population Statistics: A number that is computed from sample data; often used to estimate an unknown parameter. Example: A census of all DHS seniors found that 10% got into college early. An SRS of 30 seniors was also taken and in that sample 12% got into college early. The 10% is a parameter while the 12% is a statistic. Notation Parameter Statistic Proportion p p ˆ Mean μ x Sampling distributions and Sampling Variability: If we take repeated samples from the NWHS senior population and measure the proportion of seniors from those samples that got into college early, we will undoubtedly get different numbers for the different samples. This is referred to as sample variability. We can create a distribution of the proportions of all the samples we took and draw a histogram. I created a simulation that selected repeated samples (100 to be exact) of size 30 from a population that had a proportion, p = 0.10 of seniors who got in early to college. I then took the proportions that I got from these samples and created a histogram. The histogram looks like this: NOTE: We will use p to represent the population proportion. We will use p ˆ to represent the sample proportion which in turn is used to estimate p if p is unknown. 0 5 10 15 20 25 0 .033 .067 .100 .133 .167 .200 .233 .267 Count Proportion 100 Sample Proportions

Upload: hoangtu

Post on 06-Feb-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

1

STATS Chapter 7: Sampling Distributions

Section 7.1: Sampling Distribution Terms: Parameter: A number that describes an aspect of a population Statistics: A number that is computed from sample data; often used to estimate an unknown parameter.

Example: A census of all DHS seniors found that 10% got into college early. An SRS of 30 seniors was also taken and in that sample 12% got into college early. The 10% is a parameter while the 12% is a statistic.

Notation Parameter Statistic

Proportion p p̂ Mean µ x Sampling distributions and Sampling Variability: If we take repeated samples from the NWHS senior population and measure the proportion of seniors from those samples that got into college early, we will undoubtedly get different numbers for the different samples. This is referred to as sample variability. We can create a distribution of the proportions of all the samples we took and draw a histogram. I created a simulation that selected repeated samples (100 to be exact) of size 30 from a population that had a proportion, p = 0.10 of seniors who got in early to college. I then took the proportions that I got from these samples and created a histogram. The histogram looks like this:

NOTE: We will use p to represent the population proportion. We will use p̂ to represent the sample proportion which in turn is used to estimate p if p is unknown.

0

5

10

15

20

25

0 .033 .067 .100 .133 .167 .200 .233 .267

Cou

nt

Proportion

100 Sample Proportions

Page 2: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

2

Sampling Distribution: from here on, this phase is underlined to emphasize the ALL samples idea If we were to take all possible samples of the same size from the population and compute the sample proportion, p̂ , of each sample and then create a distribution it would be called a sampling distribution of p̂ . The following properties generally describe a sampling distribution created from samples with a large size:

• The overall shape of the distribution is symmetric and approximately normal. The larger the sample size the closer the shape is to a normal distribution.

• There are no outliers or other important deviations from the main pattern • The mean (center) of the distribution is equal to the true population parameter • The variability (spread) of the sampling distribution depends on the sample size. The larger the

sample-size the smaller the variability of the sampling distribution.

Activity 1 – Batteries Work with a partner on the battery example from page 413. What are the parameters? What are the statistics? Make a histogram of the results:

Describe the shape of the distribution of the lifetimes: HW A: 1, 3, 5, 7

Page 3: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

3

Bias: When a sampling distribution does not have its center equal to the true population parameter (consistently too high or too low), the statistic used to create that sampling distribution is said to be biased. Unbiased: Proportion p̂ is very close to p Mean x is very close to µ The goal when creating a sampling distribution is to have no bias and low variability. Here is how bias and variability are related à Note: If the sample is biased, the size of the sample does not matter, so increasing the sample size will NOT improve the results.

• The variability of a sampling distribution is determined by the sampling design and the sample size used to create the sampling distribution. As long as the population is much larger than the sample (at least 10 times as large) The spread of the sampling distribution is the same for any population size.

• Contrary to popular belief and intuition, the behavior of a statistic from random samples is not

influenced by the size of the population. To see why, think of taking a sample scoop of m&ms from a well-shuffled 1-pound bag. If the m&ms are well shuffled does the scoop of m&ms really know whether it was surrounded by a one-pound bag of m&ms or a huge bin of m&ms? Clearly it does not.

• The above realization, that variability of a sampling distribution is controlled by the size of the sample,

not the size of a population, has major implication for sampling design. It means that a survey of, say, 2000 people is just as accurate if the sample was taken from the population of a small state like Rhode Island as when taken from the population of the entire United States. As long as the sample was an SRS, it can just as easily predict some aspect of the US population as it could from the much smaller Rhode Island population. In other words, the ratio of the sample to the population is NOT important. As a matter of fact, we actually want the ratio of the population to the sample size to be large – more than 10 to 1, in order to be able to conduct most of the statistical analyses we’ll be learning about.

Page 4: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

4

German Tank Problem – similar to the activity on page 414. 1. You and a partner have been given the

task of estimating the number of tanks made by the Germans during World War II based on the serial numbers of the tanks captured by the Allies.

2. Draw a sample of 5 serial numbers from the bag that will represent the captured tanks and record the numbers in the table in the first row à

3. Describe the statistics for the first 5 serial numbers: Min: Max: Range: Mean: Samp St. Dev: Q1: Median: Q3: IQR:

4. Select one of the following methods to calculate the number of tanks. The real formula used by the Ally Statisticians is somewhere in the following list. Feel free to try a couple to see which you want to use. Circle your method of choice. A. Range + Min B. Median + IQR C. IQR • 2 D. Q3 + Q1 E. Mean + 2SD F. Mean + 3SD G. Max + Max÷n – 1 H. Max + Min I. Median • 2 J. Mean • 2 K. Other method of your choice.

5. Repeat steps 2-4 until you have repeated the process 10 times. Use the same method/formula for every sample.

6. Create a histogram of the estimatesà 7. Describe the shape of the histogram:

8. Find the statistics for the estimates: Min: Max: Range: Mean: Samp St. Dev: Q1: Median: Q3: IQR: Is your method biased or accurate, why?

9. State your FINAL ESTIMATE for the number of German Tanks: HW B: 9, 11, 13, 17-20

Page 5: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

5

Section 7.2: Sample Proportions The following properties generally describe a sampling distribution of p̂ created from samples with a large size (usually n ≥ 30):

• The overall shape of the distribution is symmetric and approximately normal. The larger the sample size the closer the shape is to a normal distribution.

• A rule of thumb used to determine if a normal curve can be used to approximate the sampling distribution of population proportions is if:

a) np > 10 (check #1) b) n(1-p) > 10 (check #2)

• There are no outliers or other important deviations from the main pattern • The mean (center) of the distribution is equal to the true population parameter, p.

Mean of p̂ is 𝜇! = 𝑝

• The variability (spread) of the sampling distribution depends on the sample size. The larger the sample-size the smaller the variability of the sampling distribution.

• Check that the population is at least ten times larger than the sample size (check #3). If this condition is met, then the standard deviation of the sampling distribution is

𝜎! = npp )1( −

Example: An SRS of 1500 high school seniors in CT was asked whether they applied to college early. Let’s assume that there are 100,000 high school seniors in the state of Connecticut, and that in fact 35% of them apply to college early. a) Perform checks 1-3 to verify that we can assume a normal distribution and the standard deviation formula. b) State the mean and standard deviation for p̂ c) What is the probability that your sample of 1500 seniors will give a result within 2 percentage points of the true value of 35%? d) Draw a normal curve that approximates the sampling distribution of p̂ . Mark the mean, standard deviations and shade the area representing the answer to the question in part c.

On the AP Formula Page:

Page 6: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

6

Here is the Normal Curve for the last Problem:

e) How large would the sample size have to be to decrease the standard deviation to ½ of 1 percent? Continue with the activity on the following page.

Page 7: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

7

HW C: 21-24, 27, 29, 33, 35, 37, 41

Page 8: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

8

Section 7.3: Sample Means Mean and Standard Deviation of a Sample Mean Suppose that x is the mean of an SRS of size n drawn from a large population with mean µ and standard deviation σ.

The mean of the sampling distribution of x is µµ =x

The standard deviation is nx

σσ = .

Note: in order to be able to use the standard deviation formula, the sample size must be less than 10% of the population size. So N must be at least 10 times larger than n. n ≤ 1/10 * N

Sampling Distribution of a Sample Mean from a Normal Population Draw an SRS of size n from a population that has a normal distribution with mean µ and standard deviation σ.

Then the sample mean x has a normal distribution with mean µ and standard deviation nσ . Since the mean

of the sampling distribution is equal to µ, this makes x an unbiased estimator of µ. Summary: 1.) Averages are less variable than individual observations. 2.) Averages are more normal than individual observations. If we look at a histogram of averages, we will get a histogram that is more normal and less spread out than a histogram of individual observations. Data is much easier to work with if it is normal and has a small spread, so it is to our advantage to look at a distribution of averages. Examples: Suppose the heights of young women are normally distributed with µ = 64.5 inches and σ = 2.5 inches. What is the probability that the mean height of an SRS of 10 young women is greater than 66.5 inches? Suppose the heights of young women are normally distributed with µ = 64.5 inches and σ = 2.5 inches. What is the probability that one randomly selected woman has a height greater than 66.5 inches?

On the AP Formula Page:

Page 9: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

9

HW D: 43-46, 49, 51, 53, 55

Page 10: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

10

Sampling Distribution of a Sample Mean from a Non-Normal Population The distribution of the sample means will still be normal as long as the sample size is large enough. This idea is called the Central Limit Theorem. The Central Limit Theorem Draw an SRS of size n from any population whatsoever with mean µ and finite standard deviation σ. When n is large the sampling distribution of the sample mean x is close to the normal distribution with mean µ and

standard deviation nσ .

The sample size n required to achieve a normal distribution depends on the population distribution. More observations are required if the shape of the population distribution is far from normal. The rule of thumb: we expect a normal distribution if n ≥ 30. Example: The number of flaws per square yard in a type of carpet material varies with mean 1.6 flaws per square yard and a standard deviation of 1.2 flaws per square yard. The population distribution cannot be normal because a count takes only whole number values. An inspector studies 200 square yards of the material, records the number of flaws found in each square yard, and calculates x , the mean number of flaws per square yard inspected. What is the probability the mean number of flaws exceeds 2 per square yard? A Twist of Old and New: Nationwide, the average GPA of members of the Delta Tau Chi Fraternity is normally distributed with mean of 2 and standard deviation of .5 A. What is the probability that a randomly selected member of ∆TX has a GPA over 2.5? B. Jack is a member of ∆TX and feels the numbers are inaccurate. He surveys a SRS of 50 ∆TX at Ohio Universities. What is the probability that the sample mean is over 2.5? C. What is the probability that less than 10 ∆TX members in the survey have a GPA over 2.5? D. Jill is a member of Delta Delta Delta (May we help you, help you, help you?) which has a nationwide GPA that is normally distributed with mean 2.2 and standard deviation of .7. Jill surveys an SRS of 40 ∆∆∆ at Ohio Universities. What is the probability that the sample mean for Jack is greater than the sample mean for Jill?

Page 11: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

11

HW E: 57, 59, 61, 63, 65-68

Page 12: STATS Chapter 7: Sampling Distributions - Kevin  · PDF fileSTATS Chapter 7: Sampling ... also taken and in that sample 12% got into college early. ... On the AP Formula Page: 6

12

Chapter 7 Review Conditions to be Normal For proportions np ≥10 and n(1-p) ≥10 For means n ≥ 30 Means For proportions 𝜇! = 𝑝 For means µµ =x Standard Deviation

For proportions    𝜎! = npp )1( − and means:

but requires that population is at least 10 times larger than sample. Concept Questions: What is a sampling distribution - key word is ALL possible samples What is the difference between a sampling distribution and a distribution of sample data. What is the difference between 𝜇!    𝑎𝑛𝑑    𝑝 OR µµ andx

What is the difference between 𝜎!    𝑎𝑛𝑑    𝜎 OR σσ andx

AP REVIEW ON PAGE 459-461