146 17 the_normal_distribution online

MATH& 146

Lesson 17

Sections 2.5 and 2.6

The Normal Distribution

1

Sampling Distribution of

the Mean

Here's a simple

simulation. Let's start

with one fair die. If we

toss this die many times,

what should the dotplot of

the numbers on the face

of the die look like?

To help you out, consider

the results of 500

simulated tosses.

2


the Mean

Now let's toss a pair of

dice and record the

average of the two.

If we repeat this (or at

least simulate it) 500

times, recording the

average of each pair,

what will the dotplot of

these 500 averages look

like?

3


the Mean

We're much more likely to

get an average near 3.5

than we are to get one near

1 or 6.

After all, the only way to get

an average of 1 (or 6) is to

roll 1's (or 6's) with both

dice.

An average of 3.5, however,

has many possibilities

4


the Mean

What if we average 3

dice? We'll simulate

500 tosses of 3 dice

and take their average:

5


the Mean

Note that it's getting

harder to have averages

near the ends, since

getting an average of 1 or

6 requires all three to

come up 1 or 6,

respectively.

That's less likely than for

2 dice to come up both 1

or 6.

6


the Mean

Let's continue this

simulation to see what

happens with larger

samples.

Here's a dotplot of the

averages for 500 tosses

of 5 dice:

7


the Mean

The pattern is becoming

clearer. Three things

continue to happen.

1) The shape is unimodal

and symmetric.

2) The shape remains

centered at 3.5.

3) The shape is

tightening.

8


the Mean

Not convinced? Let's skip

ahead and try 20 dice.

The dotplot of averages

for 500 throws of 20 dice

looks like this:

9


the Mean

At this point, you should

be asking if this only

works for dice throws. In

fact, this shape shows up

amazingly often when we

use sample means or

sample proportions.

We even have a name for

this shape: the normal

distribution.

10


Among all the distributions we see in practice, the

normal distribution is overwhelmingly the most

common. The symmetric, unimodal, bell-shaped curve

is pervasive throughout statistics.

Variables such as SAT scores and heights of US adult

males and females closely follow the normal

distribution.

11


Technically, while many variables are nearly normal,

none are exactly normal.

However, the normal distribution, while not perfect for

any single problem, is very useful for a variety of

problems. In fact, we will use it many times for the rest

of the course.

12

What Is Normal?

The normal distribution model always describes a

symmetric, unimodal, bell-shaped curve. However,

these curves can look different depending on the

details of the model. Specifically, the normal

distribution model can be adjusted using two

parameters: mean and standard deviation.

13

What Is Normal?

As you might guess, changing the mean shifts the bell

curve to the left or right, while changing the standard

deviation stretches or constricts the curve.

14

Mean = 0

Standard Deviation = 1

Mean = 19


Example 1

Consider the following sets of three distributions,

all of which are drawn to the same scale. Identify

the two distributions that are normal. Of the two

normal distributions, which one has the larger

standard deviation?

15

Notation

Because the mean and standard deviation describe a

normal distribution exactly, they are called the

distribution's parameters. These are not the same

things as sample statistics.

Sample

Statistic

Distribution

Parameter

Mean (x-bar) μ (mu)

Standard Deviation s σ (sigma)

16

x

Notation

If X is a quantity to be measured that has a normal

distribution with mean μ and standard deviation σ,

we write the distribution as N(μ,σ)

17

Notation

For example, the two distributions below can be

written as

N(μ = 0, σ = 1) and N(μ = 19, σ = 4)

18

Mean = 0


Mean = 19


Example 2

Write down each normal distribution using the

short-hand notation, and sketch its shape.

a) mean 5 and standard deviation 3

b) mean –100 and standard deviation 10

c) mean 2 and variance 9.

19

z-Scores

The z-score of an observation is the number of

standard deviations it falls above or below the

mean. We compute the z-score for an observation

x that follows a distribution with mean μ and

standard deviation σ using

20

value mean

standard deviation

xz

z-Scores

If X is normal, then the z-scores (also called

standard scores) will also be normal, but with a

mean of 0 and standard deviation of 1. That is,

N(μ = 0, σ = 1).

This distribution even has a special name: the

standard normal distribution.

21

Example 3

Suppose a student had taken 2 exams, getting 60

in a verbal test and 80 in a numerical reasoning

test. The class scores for each exam are normally

distributed. For the verbal test, the mean is 50 and

standard deviation 10; for the numerical test, the

mean is 70 and standard deviation is 12. Relative

to the rest of the class, which was the student's

best score?

22

Example 4

Over the last few classes, we have run simulations on

several case studies, including the opportunity cost

study (Lesson 13) and the medical consultant study

(Lesson 16).

Suppose we had run 10,000 simulations on each

study.

23

Example 4 continued

The two graphs show the

null distribution for both of

these case studies.

Using these graphs,

describe the shape of the

distributions and note

anything that you find

interesting.

24

Central Limit Theorem

It is common for distributions in general to be

skewed or contain outliers.

However, the null distributions we've so far

encountered have all looked somewhat similar

and, for the most part, symmetric. They all

resemble the normal distribution. This is not a

coincidence, but rather, is guaranteed by

mathematical theory.

25

Central Limit Theorem

If we look at a proportion (or difference in proportions)

and the scenario satisfies certain conditions, then the

sample proportion (or difference in proportions) will

appear to follow a bell-shaped curve called the normal

distribution.

Though the conditions are slightly different, the sample

mean (or difference in means) will also appear to

follow a normal distribution. However, we'll save the

details for later in the course (Lessons 27 – 30).

26

Conditions for Proportions

Mathematical theory guarantees that a sample

proportion or a difference in sample proportions will

follow something that resembles a normal distribution

when certain conditions are met. These conditions fall

into two categories:

• Observations in the sample are independent.

• The sample is large enough.

27

Conditions for Proportions

Observations in the sample are independent.

Independence is guaranteed when we take a random

sample of less than 10% of the population. It can also

be guaranteed if we randomly divide individuals into

treatment and control groups.

The sample is large enough. To be reasonably

certain of a unimodal, symmetric distribution, the

sample should be at least a minimum size, though

what qualifies as "minimum" differs from one context to

the next. Suitable guidelines will be given in later

lessons.

28

Example 5

Suppose the true population proportion were p = 0.95.

The figure shows what the distribution of a sample

proportion looks like when the sample size is n = 20,

n = 100, and n = 500.

29

n = 20

n = 500

n = 100

Example 5 continued

What does each point (observation) in each of the

samples represent? Describe how the distribution of

the sample proportion, , changes as n becomes

larger.

30

p̂

n = 20

n = 500

n = 100


So far we've had no need for the normal distribution.

We've been able to answer our questions somewhat

easily using simulation techniques.

This will soon change, however, since simulating data

can be non-trivial (very, very difficult).

Instead, the normal distribution (and other distributions

like it) offer a general framework that applies to a very

large number of settings.

31

Opportunity Cost

For one example, the opportunity cost study

determined that students are thriftier if they are

reminded that saving money now means they can

spend the money later. The study's point estimate

for the estimated impact was 20%, meaning 20%

fewer students would move forward with a DVD

purchase in the study scenario. However, as

we've learned, point estimates aren't perfect – they

only provide an approximation of the truth.

32

Opportunity Cost

It would be useful if we could provide a range of

plausible values for the impact, more formally

known as a confidence interval. It is often

difficult to construct a reliable confidence interval in

many situations using simulations. However,

doing so is reasonably straightforward using the

normal distribution (Lesson 21).

33

146 17 the_normal_distribution online

Education