sampling distributions

24
Sampling Sampling Distributions Distributions Presentation 7

Upload: uriel-woodard

Post on 31-Dec-2015

24 views

Category:

Documents


3 download

DESCRIPTION

Presentation 7. Sampling Distributions. Statistics VS parameters. Statistic – is a numerical value computed from a sample. Parameter – is a numerical value associated with a population. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sampling Distributions

Sampling DistributionsSampling Distributions

Presentation 7

Page 2: Sampling Distributions

Statistics VS parameters

Statistic – is a numerical value computed from a sample.

Parameter – is a numerical value associated with a population.

Essentially, we would like to know the parameter. But in most cases it is hard to know the parameter since the population is too large. So we have to estimate the parameter by some proper statistics computed from the sample.

Page 3: Sampling Distributions

Quick ReviewQuick Review p = population proportionp = population proportion = sample proportion (it is called p-hat)= sample proportion (it is called p-hat) μμ = population mean = population mean = sample mean= sample mean

Empirical rule:Empirical rule: For Variables with a Normal (Bell-Shaped For Variables with a Normal (Bell-Shaped

Distribution)Distribution)~68% of the values fall within +/- 1 standard ~68% of the values fall within +/- 1 standard

deviation of the mean. deviation of the mean. ~95% of the values fall within +/-2 standard ~95% of the values fall within +/-2 standard

deviations of the mean.deviations of the mean.

x

Page 4: Sampling Distributions

Sampling Distribution of the Sampling Distribution of the Sample ProportionSample Proportion

Situation 1:Situation 1: A survey is undertaken to A survey is undertaken to determine the proportion of PSU students determine the proportion of PSU students who engage in under-age drinking. The who engage in under-age drinking. The survey asks 200 random under-age survey asks 200 random under-age students (assume no problems with bias). students (assume no problems with bias). Suppose the true population proportion of Suppose the true population proportion of those who drink is 60% or p=.6 those who drink is 60% or p=.6

is the proportion in the sample who drink.is the proportion in the sample who drink.p̂

Page 5: Sampling Distributions

Repeated SamplesRepeated Samples

Imagine repeating this survey many timesImagine repeating this survey many times, and , and each time we record the sample proportion of each time we record the sample proportion of those who have engaged in under-age drinking. those who have engaged in under-age drinking. What would the sampling distribution of look What would the sampling distribution of look like?like?

Sample (n=200)Sample (n=200) Sample Sample ProportionProportion

11 11

22 22

33 33

44 44

55 55

…… ……

150,000150,000 150,000150,000

p̂p̂

p̂p̂p̂

p̂ is a random variable assigning a value to each sample!

Page 6: Sampling Distributions

Histogram of for 150k Histogram of for 150k samples.samples.

0.4 0.5 0.6 0.7 0.8

02

46

810

Page 7: Sampling Distributions

Sampling Distribution of Sampling Distribution of Derived from the Binomial Derived from the Binomial

DistributionDistributionLet X be the number of respondents who say they engage in under age Let X be the number of respondents who say they engage in under age drinking. What is the PDF of X? drinking. What is the PDF of X?

X is binomial with n=200 and p=.6 so we can calculate the probability X is binomial with n=200 and p=.6 so we can calculate the probability of X for each possible outcome (0-200). The PDF is plotted below:of X for each possible outcome (0-200). The PDF is plotted below:

69 74 79 84 89 94 99 104 109 114 119 124 129 134 139 144 149 154 159 164 169

X

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Pro

babi

lity

Page 8: Sampling Distributions

Sampling Dist. of Sampling Dist. of

Since the is simply Since the is simply X/nX/n it follows that the sampling it follows that the sampling distribution of is the same as that of the binomial distribution of is the same as that of the binomial distribution divided by n.distribution divided by n.

n

pppsep

n

pp

n

pnpnXsdpsdp

pnXEpEp

)ˆ1(ˆ)ˆ( :ˆofErrorStandard

)1()1()()ˆ(:ˆStd.Dev.of

)()ˆ(:ˆofMean

p̂p̂

Page 9: Sampling Distributions

Normal Approximation for Normal Approximation for Sample ProportionsSample Proportions

The sampling distribution of is The sampling distribution of is approximately normal with mean approximately normal with mean pp and and standard deviation standard deviation

if the following conditions are if the following conditions are satisfied: satisfied:

1.1. A random sample is selected from the population. A random sample is selected from the population. Even if the sample is not perfectly random, as long Even if the sample is not perfectly random, as long as it is free from bias it will be okay.as it is free from bias it will be okay.

2.2. Sample must be large enough, Sample must be large enough, npnp and and n(1-p)n(1-p) MUST MUST be greater than 5, and should be greater than 10. be greater than 5, and should be greater than 10.

npp )1(

Page 10: Sampling Distributions

Example: Problem 9.11Example: Problem 9.11

Recent studies have shown that about 20% of Recent studies have shown that about 20% of American adults fit the medical definition of American adults fit the medical definition of being obese. A large medical clinic would like being obese. A large medical clinic would like to estimate what percent of their patients are to estimate what percent of their patients are obese, so they take a random sample of 100 obese, so they take a random sample of 100 patients and find that 18 percent are obese. patients and find that 18 percent are obese. Suppose in truth, the same percentage holds Suppose in truth, the same percentage holds for the patients of the medical clinic as for the for the patients of the medical clinic as for the general population, 20%. Give a numerical general population, 20%. Give a numerical value of each of the following….value of each of the following….

Page 11: Sampling Distributions

Problem 9.11 Cont. Problem 9.11 Cont. a.a. The population proportion of obese patients in the The population proportion of obese patients in the

medical clinic, p = .2medical clinic, p = .2

b.b. The proportion of obese patients in the sample of 100 The proportion of obese patients in the sample of 100 patients, = 18/100 = 0.18 patients, = 18/100 = 0.18

c.c. The standard error of , = 0.0384 The standard error of , = 0.0384

d.d. The mean of the sampling distribution of = p = .2The mean of the sampling distribution of = p = .2

e.e. The standard deviation of the sampling distribution of The standard deviation of the sampling distribution of , = .04 , = .04

n

pp )1(

n

pp )ˆ1(ˆ

Page 12: Sampling Distributions

Sampling Distribution of the Sampling Distribution of the Sample MeanSample Mean

Situation 2:Situation 2: The mean height of women age The mean height of women age 20 to 30 is normally distributed (bell-shaped) 20 to 30 is normally distributed (bell-shaped) with a mean of 65 inches and a standard with a mean of 65 inches and a standard deviation of 3 inches. A random sample of deviation of 3 inches. A random sample of 200 women was taken and the sample mean 200 women was taken and the sample mean recorded. recorded.

Now IMAGINE taking MANY samples of size Now IMAGINE taking MANY samples of size 200 from the population of women.200 from the population of women. For each For each sample we record the . What is the sample we record the . What is the sampling distribution of ?sampling distribution of ?

x

xx

Page 13: Sampling Distributions

Histograms for the Histograms for the Distribution of X and X-BarDistribution of X and X-Bar

50 55 60 65 70 75 80

0.00

0.02

0.04

0.06

0.08

0.10

0.12

X62 63 64 65 66 67 68

0.0

0.5

1.0

1.5

x

Original Population of Women: X= height of random woman

Distribution of Sample Means: X-bar = mean of random sample of size 200.

Page 14: Sampling Distributions

For Normal Data:For Normal Data:

Consider a random variable X with mean Consider a random variable X with mean μμ and standard deviation and standard deviation σσ..

The sampling distribution of the sample The sampling distribution of the sample mean for sample of size n, is normal with…mean for sample of size n, is normal with…

What about for skewed or non-normal data?What about for skewed or non-normal data?n

xsdx

xEx

)(

)(

of Std.Dev.

ofMean

Page 15: Sampling Distributions

CD Data from the Class CD Data from the Class SurveySurvey

0 100 200 300 400 500 600

01

02

03

04

0

CDs

Situation 3: Clearly CDs is a right skewed data set. Suppose our population looked something like this, let us take repeated samples from this population and see what the sample mean looks like.

Page 16: Sampling Distributions

Suppose we take repeated Suppose we take repeated samples of size, 4, 8, 16, 32samples of size, 4, 8, 16, 32

0 100 200 300

050

010

00

15

00

20

00

Sample Mean for n=40 50 100 150 200 250

020

040

060

080

010

00

12

00

Sample Mean for n=8

50 100 150 200

020

040

060

080

0

Sample Mean for n1640 60 80 100 120 140 160 180

020

040

060

080

0

Sample Mean for n=32

n = 4

n = 32n = 16

n = 8

Page 17: Sampling Distributions

Statistics From Skewed DataStatistics From Skewed Data Using that CD sample as the population, Using that CD sample as the population,

µ = 87.6, µ = 87.6, σσ = 87.8 = 87.8

The sample means from the previous slide had the The sample means from the previous slide had the following summary statistics:following summary statistics:

Sample SizeSample Size MeanMean Std. Std. DeviationDeviation

N = 4N = 4 86.686.6 43.243.2

N = 8N = 8 86.886.8 30.930.9

N = 16N = 16 86.786.7 21.921.9

N = 32N = 32 86.686.6 15.615.6

Note: that the mean remains constant, and the std. deviation decreases as the sample size increases!

Page 18: Sampling Distributions

Conclusions and Conditions Conclusions and Conditions for the Sample Meanfor the Sample Mean

For non-normal data the sampling For non-normal data the sampling distribution of the sample mean is distribution of the sample mean is approximately normal with mean approximately normal with mean μμ and and standard deviation standard deviation σσ//

Conditions!Conditions!

The above is true if the sample size is large The above is true if the sample size is large enough, usually enough, usually nn greater than 30 is greater than 30 is sufficient. sufficient.

n

Page 19: Sampling Distributions

What next?What next?

We have shown that both the sampling We have shown that both the sampling distribution of the sample proportion, and the distribution of the sample proportion, and the sampling distribution of the sample mean are sampling distribution of the sample mean are both normal under certain conditions. both normal under certain conditions.

Now we can use what we know about normal Now we can use what we know about normal distributions to draw conclusions about and distributions to draw conclusions about and !!

Situation 4, demonstrates how to use the Situation 4, demonstrates how to use the sampling distribution of p-hat to draw sampling distribution of p-hat to draw conclusions. conclusions.

xp̂

Page 20: Sampling Distributions

Situation 4:Situation 4: A certain antibiotic in known to cure A certain antibiotic in known to cure 85% of strep bacteria infections. A scientist wants 85% of strep bacteria infections. A scientist wants to make sure the drug does not lose its potency to make sure the drug does not lose its potency over time. He treats 100 strep patients with a 1 over time. He treats 100 strep patients with a 1 year old supply of the antibiotic. Let be the year old supply of the antibiotic. Let be the proportion of individuals who are cured. proportion of individuals who are cured. ASSUME the drug has NOT lost potency, answer the following ASSUME the drug has NOT lost potency, answer the following questions…questions…

1.1. What is the sampling distribution of ? What is the sampling distribution of ?

2.2. If we repeated this study many times we would expect If we repeated this study many times we would expect 95% of to fall within what interval?95% of to fall within what interval?

3.3. What is the probability that more than 90% in the sample What is the probability that more than 90% in the sample are cured? are cured?

4.4. Suppose the scientist observed a cure rate of only 75%, Suppose the scientist observed a cure rate of only 75%, would he be justified in concluding the 1 year old drug is would he be justified in concluding the 1 year old drug is less effective?less effective?

Page 21: Sampling Distributions

1. What is the sampling distribution of ?1. What is the sampling distribution of ?

Since both np = 85 and n(1-p) = 15 are Since both np = 85 and n(1-p) = 15 are greater than 10, and if we assume the greater than 10, and if we assume the sample is random/representative….sample is random/representative….

Then the sampling distribution of is Then the sampling distribution of is approximately normal with mean p=.85 approximately normal with mean p=.85 and standard deviation = .036.and standard deviation = .036.

n

pp )1(

Page 22: Sampling Distributions

2.2. If we repeated this study many times we If we repeated this study many times we would expect 95% of to fall within what would expect 95% of to fall within what

interval?interval?

The The empirical ruleempirical rule states that for a states that for a normally distributed variable ~95% of the normally distributed variable ~95% of the values fall within +/- 2 standard deviations values fall within +/- 2 standard deviations of the mean.of the mean.

So 95% of the should fall within So 95% of the should fall within

.85+/- 2*.036.85+/- 2*.036

or or

there is 95% probability that the proportion there is 95% probability that the proportion cured should be between 78% and 92% cured should be between 78% and 92%

Page 23: Sampling Distributions

3. 3. What is the probability that more than What is the probability that more than

90% in the sample are cured?90% in the sample are cured?

In other words what is P( In other words what is P( >.9)?>.9)?

First calculate a z-score…First calculate a z-score…

Z-score = [value-mean]/StdDevZ-score = [value-mean]/StdDev

Z-score = [.9-.85]/.036 =1.4Z-score = [.9-.85]/.036 =1.4

P( P( >.9) = P(Z>1.4 ) = 1- P(Z<1.4 ) >.9) = P(Z>1.4 ) = 1- P(Z<1.4 )

= 1-.9192 = .0808= 1-.9192 = .0808

Page 24: Sampling Distributions

4. 4. Suppose the scientist observed a cure Suppose the scientist observed a cure rate of only 75%, would he be justified in rate of only 75%, would he be justified in concluding the 1 year old drug is less concluding the 1 year old drug is less effective?effective?

In other words, assuming the cure rate is In other words, assuming the cure rate is actually 85%, what is the chance he would actually 85%, what is the chance he would observe as sample proportion equal or less than observe as sample proportion equal or less than 75%? What is P( 75%? What is P( .75)?.75)?

Z-score = [.75-.85]/.036 = -2.80Z-score = [.75-.85]/.036 = -2.80P( P( .75) = P(Z< -2.80) = .0026.75) = P(Z< -2.80) = .0026

We will see some examples about We will see some examples about how to use the sampling distribution how to use the sampling distribution of the sample mean in class of the sample mean in class activities…but it is similar idea.activities…but it is similar idea.