sampling distributions - university of notre...

57
Sampling and Variability The Central Limit Theorem Assumptions Sampling Distributions Ken Kelley’s Class Notes 1 / 57

Upload: others

Post on 18-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling Distributions

Ken Kelley’s Class Notes

1 / 57

Page 2: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Lesson Breakdown by Topic

1 Sampling and VariabilitySampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

2 The Central Limit TheoremVisualization

3 AssumptionsNotation GlossaryWhat You LearnedA Worked Example

2 / 57

Page 3: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

What You Will Learn from this Lesson

You will learn:

The idea of sampling from a population to obtain pointestimates.

The idea of the variability of point estimates.

How the standard deviation of a mean decreases as the samplesize increases (specifically at a rate of 1/

√n).

How to find probability values for a mean (rather than a singlescore, as was done when discussing the normal distribution).

How the Central Limit Theorem allows the normal distributionto be used when interest concerns a mean (even if thedistribution of scores is not normal).

3 / 57

Page 4: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Motivation

If the mean effectiveness of a blood pressure medication isestimated to be a 15 point reduction in blood pressure after aweek (i.e., X̄ = −15), does this mean that µ = −15?

What if you discovered that X̄ = −15 when n = 20?

What if three other studies, each with n = 20, reported thefollowing: X̄1 = −1, X̄2 = 3, and X̄3 = −5?

What if you discovered that X̄ = −4.5 when n = 250, 000?

The estimated mean is sample specific, but it represents afixed population value. . . understanding the variability ofestimates is important for making decisions based on a sample.

4 / 57

Page 5: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Motivation

If the mean effectiveness of a blood pressure medication isestimated to be a 15 point reduction in blood pressure after aweek (i.e., X̄ = −15), does this mean that µ = −15?

What if you discovered that X̄ = −15 when n = 20?

What if three other studies, each with n = 20, reported thefollowing: X̄1 = −1, X̄2 = 3, and X̄3 = −5?

What if you discovered that X̄ = −4.5 when n = 250, 000?

The estimated mean is sample specific, but it represents afixed population value. . . understanding the variability ofestimates is important for making decisions based on a sample.

5 / 57

Page 6: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Motivation

If the mean effectiveness of a blood pressure medication isestimated to be a 15 point reduction in blood pressure after aweek (i.e., X̄ = −15), does this mean that µ = −15?

What if you discovered that X̄ = −15 when n = 20?

What if three other studies, each with n = 20, reported thefollowing: X̄1 = −1, X̄2 = 3, and X̄3 = −5?

What if you discovered that X̄ = −4.5 when n = 250, 000?

The estimated mean is sample specific, but it represents afixed population value. . . understanding the variability ofestimates is important for making decisions based on a sample.

6 / 57

Page 7: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Motivation

If the mean effectiveness of a blood pressure medication isestimated to be a 15 point reduction in blood pressure after aweek (i.e., X̄ = −15), does this mean that µ = −15?

What if you discovered that X̄ = −15 when n = 20?

What if three other studies, each with n = 20, reported thefollowing: X̄1 = −1, X̄2 = 3, and X̄3 = −5?

What if you discovered that X̄ = −4.5 when n = 250, 000?

The estimated mean is sample specific, but it represents afixed population value. . . understanding the variability ofestimates is important for making decisions based on a sample.

7 / 57

Page 8: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Motivation

If the mean effectiveness of a blood pressure medication isestimated to be a 15 point reduction in blood pressure after aweek (i.e., X̄ = −15), does this mean that µ = −15?

What if you discovered that X̄ = −15 when n = 20?

What if three other studies, each with n = 20, reported thefollowing: X̄1 = −1, X̄2 = 3, and X̄3 = −5?

What if you discovered that X̄ = −4.5 when n = 250, 000?

The estimated mean is sample specific, but it represents afixed population value. . . understanding the variability ofestimates is important for making decisions based on a sample.

8 / 57

Page 9: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling

Simple random sample: when all members of a populationhave the same probability of being selected.

Unless otherwise stated, we will assume simple randomsampling.

This is not the way all samples are collected – so be aware ofdata collection methods.

There are other sampling methods, which we will discuss at alater time.

9 / 57

Page 10: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling

Simple random sample: when all members of a populationhave the same probability of being selected.

Unless otherwise stated, we will assume simple randomsampling.

This is not the way all samples are collected – so be aware ofdata collection methods.

There are other sampling methods, which we will discuss at alater time.

10 / 57

Page 11: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling

Simple random sample: when all members of a populationhave the same probability of being selected.

Unless otherwise stated, we will assume simple randomsampling.

This is not the way all samples are collected – so be aware ofdata collection methods.

There are other sampling methods, which we will discuss at alater time.

11 / 57

Page 12: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling

Simple random sample: when all members of a populationhave the same probability of being selected.

Unless otherwise stated, we will assume simple randomsampling.

This is not the way all samples are collected – so be aware ofdata collection methods.

There are other sampling methods, which we will discuss at alater time.

12 / 57

Page 13: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Estimation

Point estimate: the estimated value of a populationparameter of interest based on a sample from the populationof interest.

Point estimates are (essentially) always wrong.

A sample estimate will differ (usually) from the populationparameter it estimates.

Correspondingly, if new or a different sample had beencollected, a different point estimate would likely result.

13 / 57

Page 14: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Estimation

Point estimate: the estimated value of a populationparameter of interest based on a sample from the populationof interest.

Point estimates are (essentially) always wrong.

A sample estimate will differ (usually) from the populationparameter it estimates.

Correspondingly, if new or a different sample had beencollected, a different point estimate would likely result.

14 / 57

Page 15: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Estimation

Point estimate: the estimated value of a populationparameter of interest based on a sample from the populationof interest.

Point estimates are (essentially) always wrong.

A sample estimate will differ (usually) from the populationparameter it estimates.

Correspondingly, if new or a different sample had beencollected, a different point estimate would likely result.

15 / 57

Page 16: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Estimation

Point estimate: the estimated value of a populationparameter of interest based on a sample from the populationof interest.

Point estimates are (essentially) always wrong.

A sample estimate will differ (usually) from the populationparameter it estimates.

Correspondingly, if new or a different sample had beencollected, a different point estimate would likely result.

16 / 57

Page 17: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling Variability

Multiple samples can be taken and a point estimate calculatedin each instance.

The sample-to-sample variability of the point estimates issampling variability.

Sampling error is the random difference between an estimateand the parameter it estimates.

17 / 57

Page 18: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling Variability: A Demonstration

The effects of sampling variability on a normal distribution: ADemonstration (Link).

18 / 57

Page 19: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Thought Question

Does Variability Matter in theMarket?

19 / 57

Page 20: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Thought Question

Does Variability Matter in theMarket?Yes!

20 / 57

Page 21: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Does Variability Matter in the Market?

Volatility: the standard deviation of the continuouslycompounded returns of a financial instrument (over aspecified period of time).

Volatility is the standard deviation!

Implied volatility, an annualized standard deviation.

The VIX (SP 500 market volatility index).

The VXN (Nasdaq 100 volatility index).

The VXD (DJIA volatility index).

21 / 57

Page 24: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Does Variability Matter in the Market

Example of the effect of volatility (i.e., the standarddeviation) on returns using @Risk Demonstration (Link).

Example performance measures from Fidelity (note thestandard deviation) (Link).

24 / 57

Page 25: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

The Standard Deviation of the Mean

Often, interest is not literally in individual scores, but ratherthe mean of a set of scores (e.g., months, groups, teams).

The variability of the mean is different from the variability ofscores.

The mean is based on more information than individual scoresand is more stable.

25 / 57

Page 26: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

The Standard Deviation of the Mean

The population standard deviation of the mean is denoted σx̄ ,which equals σx̄ = σ√

n.

Because√n is in the denominator, as the sample size

increases, the variability of the mean decreases.

This fact is extremely important, as it shows how increasingsample size leads to more precise (i.e., less variable) estimates.

Increasing sample size decreases “noise” and magnifies“signal.”

26 / 57

Page 27: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

The Standard Deviation of the Mean

The population standard deviation of the mean is denoted σx̄ ,which equals σx̄ = σ√

n.

Because√n is in the denominator, as the sample size

increases, the variability of the mean decreases.

This fact is extremely important, as it shows how increasingsample size leads to more precise (i.e., less variable) estimates.

Increasing sample size decreases “noise” and magnifies“signal.”

27 / 57

Page 28: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

The Standard Deviation of the Mean

The population standard deviation of the mean is denoted σx̄ ,which equals σx̄ = σ√

n.

Because√n is in the denominator, as the sample size

increases, the variability of the mean decreases.

This fact is extremely important, as it shows how increasingsample size leads to more precise (i.e., less variable) estimates.

Increasing sample size decreases “noise” and magnifies“signal.”

28 / 57

Page 29: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling Distributions

Sampling Distribution: the distribution of a statistic ofinterest when when that statistic is calculated from randomsamples of data under the same set of conditions.

This can be for any statistic (e.g., x̄ , p̄, s, s2, r , etc.).

Each statistic has a sampling distribution with its ownproperties.

The sampling distribution always depends on sample size.

The larger the sample size, the less variable the statistic.

29 / 57

Page 30: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling Variability: 12 Ounce Cans

Beverage manufacturers must ensure that the stated amountof liquid is close to the actual amount.

Manufactures know that the process of filling a can is notexact (e.g., temperature, barometric pressure, viscosity).

The manufacturing specifications for a bottler is such that themachines attempt to fill each can with 12.20 ounces of liquid.

Suppose that data shows that the standard deviation is 0.25of ounces of liquid across the cans.

30 / 57

Page 31: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Recall, the Standard Normal Distribution

A normal distribution with a mean of µ = 0 and a standarddeviation of σ = 1 is the standard normal distribution.

Such a distribution is standardized because z-scores are formed.

Recall, a z-score for individual i is defined as

zi =xi − µσ

Any normal distribution can be converted into a standardnormal distribution by transforming scores into z-scores.

The distribution shape (e.g., if it is skewed) does not changeby converting to z-scores (but the mean and standarddeviation does).

31 / 57

Page 32: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling Variability: An Example

What is the probability that a can has less than 12 ounces?

z =x − µσ

=12− 12.2

.25=−.2.25

= −.8

Thus, P(Z ≤ −.8) = .21

32 / 57

Page 33: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling Variability: An Example

What about the mean of a 6 pack being less than 12 ounces?

z =x̄ − µσ/√n

=12− 12.2

.25/√

6=−.20

.1021= −1.96;P(Z ≤ −1.96) = .025.

33 / 57

Page 34: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Using Excel Instead of the z-Table

The NORM.S.DIST formula in Excel can be used instead ofthe z-table.

NORM.S.DIST requires one to specify the z value and TRUE(for cumulative).

That is: NORM.S.DIST(z , TRUE)

For the example of a can being less than 12 ounces, in whichz = −.80:NORM.DIST(X , 0, 1, TRUE).

The formula returns: .2118554 (which for our purposes can berounded to four decimal places: P = .2119.

Recall formulas in Excel require an “=” sign in order to returnthe results. Thus, “=NORM.DIST(X , 0, 1, TRUE)” would beentered in an Excel cell.

34 / 57

Page 35: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Using Excel More Generally: For Any Normal Distribution

The NORM.DIST formula in Excel can be used for anynormal distribution (notice there is not “S”, which denotedstandardized).

NORM.DIST requires one to specify the X value, µ, σ, andTRUE (for cumulative).

That is: NORM.DIST(X , µ, σ, TRUE)

For a standard normal distribution (i.e., z-Distribution):NORM.DIST(X , 0, 1, TRUE)

For the example of a six-pack being less than 12 ounces:NORM.DIST(12, 12.20, .25/sqrt(6), TRUE).

The formula returns: .0250218 (which for our purposes can berounded to four decimal places: P = .0250.

35 / 57

Page 36: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Sampling/VariabilityDemonstrationStandard Deviation of the MeanSampling Variability Example

Sampling Variability: An Example

Rather than using the z-table, the NORM.DIST formula inExcel can be used:

36 / 57

Page 37: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling Variability Visually (Equal X -Axis)

Distribution of the Weight of Individual Cans

11.2 11.6 12.0 12.4 12.8 13.2

Den

sity

Weight of One Cans

Distribution of Sample Mean forthe Weight of Cans in a 6−Pack

11.2 11.6 12.0 12.4 12.8 13.2

Den

sity

Sample Mean Weight of the Cans in a 6 Pack

Distribution of Sample Mean forthe Weight of Cans in a 12−Pack

11.2 11.6 12.0 12.4 12.8 13.2

Den

sity

Sample Mean Weight of the Cans in a 12 Pack

Distribution of Sample Mean forthe Weight of Cans in a Case

11.2 11.6 12.0 12.4 12.8 13.2

Den

sity

Sample Mean Weight of the Cans in a Case

Page 38: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling Variability Visually (Equal X and Y Axes)

Distribution of the Weight of Individual Cans

11.2 11.6 12.0 12.4 12.8 13.2

01

23

45

67

8D

ensi

ty

Weight of One Can

Distribution of Sample Mean forthe Weight of Cans in a 6−Pack

11.2 11.6 12.0 12.4 12.8 13.2

01

23

45

67

8D

ensi

ty

Sample Mean Weight of the Cans in a 6 Pack

Distribution of Sample Mean forthe Weight of Cans in a 12−Pack

11.2 11.6 12.0 12.4 12.8 13.2

01

23

45

67

8D

ensi

ty

Sample Mean Weight of the Cans in a 12 Pack

Distribution of Sample Mean forthe Weight of Cans in a Case

11.2 11.6 12.0 12.4 12.8 13.2

01

23

45

67

8D

ensi

ty

Sample Mean Weight of the Cans in a Case

Page 39: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

AssumptionsVisualization

The Central Limit Theorem

The Central Limit Theorem

39 / 57

Page 40: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

AssumptionsVisualization

The Central Limit Theorem

Random samples of size n drawn from some population willhave a sampling distribution for x̄ that can be approximatedby a normal distribution as the sample size becomes large.

Note that nothing is stated about the shape of thedistribution in the population!

The distribution of x̄ will have mean µ and standard deviationσx̄ = σ√

n.

This is one of the most important theorems in all of statistics!

40 / 57

Page 41: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

AssumptionsVisualization

The Central Limit Theorem

Random samples of size n drawn from some population willhave a sampling distribution for x̄ that can be approximatedby a normal distribution as the sample size becomes large.

Note that nothing is stated about the shape of thedistribution in the population!

The distribution of x̄ will have mean µ and standard deviationσx̄ = σ√

n.

This is one of the most important theorems in all of statistics!

41 / 57

Page 42: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Histogram of Uniform Distribution

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Histogram of Uniform Distribution

Score

Den

sity

Page 43: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling Distribution of X̄ from a Uniform Distribution

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

Sample Means from Uniform Distribution when n = 2

Value Observed for Sample Mean

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

3.0

Sample Means from Uniform Distribution when n = 5

Value Observed for Sample Mean

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Sample Means from Uniform Distribution when n = 10

Value Observed for Sample Mean

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

67

Sample Means from Uniform Distribution when n = 25

Value Observed for Sample Mean

Den

sity

Page 44: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Histogram of Log Normal Distribution

0 5 10 15

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Histogram of Log Normal Distribution

Score

Den

sity

Page 45: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling Distribution of X̄ from Log Normal Distribution

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Sample Means from Log Normal Distribution When n = 2

Sample Mean

Den

sity

0 2 4 6 8

0.0

0.2

0.4

0.6

Sample Means from Log Normal Distribution When n = 5

Sample Mean

Den

sity

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Sample Means from Log Normal Distribution When n = 25

Sample Mean

Den

sity

0 2 4 6 8

0.0

0.5

1.0

Sample Means from Log Normal Distribution When n = 50

Sample Mean

Den

sity

Page 46: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Histogram of Mixed Normal Distribution

0 5 10 15

0.00

0.05

0.10

0.15

Histogram of Mixed Normal Distribution

Score

Den

sity

Page 47: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling Distribution of X̄ from a Mixed Normal

−5 0 5 10 15 20

0.00

0.04

0.08

0.12

Sample Means from Mixtured Normal Distributions When n = 2

Sample Mean

Den

sity

−5 0 5 10 15 20

0.00

0.05

0.10

0.15

Sample Means from Mixtured Normal Distributions When n = 5

Sample Mean

Den

sity

−5 0 5 10 15 20

0.00

0.10

0.20

Sample Means from Mixtured Normal Distributions When n = 10

Sample Mean

Den

sity

−5 0 5 10 15 20

0.0

0.1

0.2

0.3

0.4

Sample Means from Mixtured Normal Distributions When n = 25

Sample Mean

Den

sity

Page 48: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

AssumptionsVisualization

The Central Limit Theorem

Said another way, with regards to the sampling distribution ofthe sample mean, the shape of the distribution from whichscores are sampled is irrelevant, because as sample size growslarge the sampling distribution of the sample mean will benormally distributed!

Because the distribution of the sample means is normal,provided sample size is not too small, inferences about themean can legitimately be made using a normal distribution!

48 / 57

Page 49: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Notation GlossaryWhat You LearnedA Worked Example

The Normality Assumption

The standard normal distribution (i.e., the z-table) assumesthat the normally distributed.If the normality assumption is violated, the probabilitiesobtained will not be correct.If interest concerns a mean, which is often the case, theCentral Limit Theorem tells us that the distribution of samplemeans will be normally distributed (if sample size is not toosmall).

Thus, the z-distribution can be used when interest concernsmeans, regardless of the shape of the distribution of the scores.However, the z-distribution should not be used for probabilitiesof individual scores if the distribution from which the scoresare sampled is not normal.

We have also assumed that σ is known.

We relax this assumptions in the next topic (by using at-distribution instead of z). 49 / 57

Page 50: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Notation GlossaryWhat You LearnedA Worked Example

Sampling Error

The random discrepancy between a statistic (from a sample)and the parameter (the population value).

Sampling error is the reason for inferential statistics.

If there were no sampling error, the estimate from a particularsample would equal the population value.

Because sample error (essentially) always exists, we need toquantify the uncertainty of an estimate.

This is why confidence intervals and hypothesis tests are soimportant (much more to come on these procedures later).

50 / 57

Page 51: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Notation GlossaryWhat You LearnedA Worked Example

Finite Sample Calculations

For small populations in which the population size (N) isknown, there is a correction factor that can be used whencalculating the standard error of the mean.

Rather than using σx̄ = σ√n

as would be typical,using

σx̄ =√

N−nN−1

(σ√n

)provides a more appropriate value of the

standard error of the sample mean in situations in which thepopulation is small.

In general, this might be used only if the sample size is 5% ormore of the population size.

Usually we assume that the data come from an infinitepopulation or process (or we acknowledge we do not know thepopulation size), and thus this correction is not often needed.

51 / 57

Page 52: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Notation GlossaryWhat You LearnedA Worked Example

Notation Glossary

σx̄ - population standard deviation of the sample mean(σx̄ = σ√

n)

n - sample size

N - population size

x̄ - the mean of a sample

µ - the mean of a population

z - z-score for a score of interest (z = x−µσ )

z - z-score when the sample mean is of interest (z = x̄−µσ/√n

)

52 / 57

Page 53: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Notation GlossaryWhat You LearnedA Worked Example

What You Learned from this Lesson

You learned:

The idea of sampling from a population to obtain pointestimates.

The idea of the variability of point estimates.

Understanding how the standard deviation of a mean decreasesas the sample size increases (i.e., σX̄ = σ/

√n).

How to find probability values for a mean.

How the Central Limit Theorem allows for normal distributionto be used when interest concerns a mean (even if thedistribution of scores is not normal).

How the Central Limit Theorem allows for the assumption ofnormality not to be met, but still allows the use of the normaldistribution when finding probabilities for a mean.

53 / 57

Page 54: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Notation GlossaryWhat You LearnedA Worked Example

A Worked Example

From a statement made by a automotive dealership in a courtproceeding, it was stated that “40% of the time they sell plusor minus 8 vehicles per month around the population mean.”That is, p(µ± 8) = .40.

Assume that the statement is based on the relevant populationvalues and the vehicle sales follow a normal distribution.

What is σ, which you would find, as a competitor to thisdealership, very valuable for comparison purposes?

54 / 57

Page 55: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Notation GlossaryWhat You LearnedA Worked Example

A Worked Example, Continued

First, we need to determine the relevant z-values based onwhat is known.

If there is a 40% chance of selling plus/minus 8 vehiclesaround mean, we know can find the corresponding z-scores;60% of the time the dealership falls outside of the limits (30%less and 30% more).

There thus being 30% of the time on either size:p(Z ≤ z) = .30 is for z = .524; meaning for this problem therelevant quantiles (i.e., z-values) are ±.524 (which can befound using qnorm(.30) and qnorm(.70)).

The relevant z-values are thus ±.524.

55 / 57

Page 56: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Notation GlossaryWhat You LearnedA Worked Example

A Worked Example, Continued

Recalling that z = X−µσ , and from the problem we have

±.524 = ±8σ , we are solve for σ (because we know everything

else):σ = 8/.524 = 15.267.

Thus, we have found that the population standard deviation isaround 15.267 cars per month.

56 / 57

Page 57: Sampling Distributions - University of Notre Damekkelley/Teaching/Lectures/Lecture__Sampling_Distributi… · Sampling and Variability The Central Limit Theorem Assumptions What You

Sampling and VariabilityThe Central Limit Theorem

Assumptions

Notation GlossaryWhat You LearnedA Worked Example

A Worked Example, Continued

What is the probability of the dealership having a “summermean” (i.e, of the three months of summer) that is within 10vehicles of the overall mean?Here we use X̄ instead of X , and thus σX̄ instead of σ:

z = X̄−µσ/√n

.

From earlier we already know that σ = 15.267 and we knowthe other values:

z =±10

15.267/√

3=±10

8.81.

The z-values of interest are ±1.135.Thus, p(± 10 Cars around µ) = p(Z ≥ −1.135 & Z ≤1.135) = .7436.

The above probability can be obtained from1− 2× pnorm(−1.135) or, equivalently yet more formally,1-[pnorm(-1.135)+1-pnorm(1.135)]. 57 / 57