146 17 the_normal_distribution online
Post on 07-Feb-2017
15 Views
Preview:
TRANSCRIPT
MATH& 146
Lesson 17
Sections 2.5 and 2.6
The Normal Distribution
1
Sampling Distribution of
the Mean
Here's a simple
simulation. Let's start
with one fair die. If we
toss this die many times,
what should the dotplot of
the numbers on the face
of the die look like?
To help you out, consider
the results of 500
simulated tosses.
2
Sampling Distribution of
the Mean
Now let's toss a pair of
dice and record the
average of the two.
If we repeat this (or at
least simulate it) 500
times, recording the
average of each pair,
what will the dotplot of
these 500 averages look
like?
3
Sampling Distribution of
the Mean
We're much more likely to
get an average near 3.5
than we are to get one near
1 or 6.
After all, the only way to get
an average of 1 (or 6) is to
roll 1's (or 6's) with both
dice.
An average of 3.5, however,
has many possibilities
4
Sampling Distribution of
the Mean
What if we average 3
dice? We'll simulate
500 tosses of 3 dice
and take their average:
5
Sampling Distribution of
the Mean
Note that it's getting
harder to have averages
near the ends, since
getting an average of 1 or
6 requires all three to
come up 1 or 6,
respectively.
That's less likely than for
2 dice to come up both 1
or 6.
6
Sampling Distribution of
the Mean
Let's continue this
simulation to see what
happens with larger
samples.
Here's a dotplot of the
averages for 500 tosses
of 5 dice:
7
Sampling Distribution of
the Mean
The pattern is becoming
clearer. Three things
continue to happen.
1) The shape is unimodal
and symmetric.
2) The shape remains
centered at 3.5.
3) The shape is
tightening.
8
Sampling Distribution of
the Mean
Not convinced? Let's skip
ahead and try 20 dice.
The dotplot of averages
for 500 throws of 20 dice
looks like this:
9
Sampling Distribution of
the Mean
At this point, you should
be asking if this only
works for dice throws. In
fact, this shape shows up
amazingly often when we
use sample means or
sample proportions.
We even have a name for
this shape: the normal
distribution.
10
The Normal Distribution
Among all the distributions we see in practice, the
normal distribution is overwhelmingly the most
common. The symmetric, unimodal, bell-shaped curve
is pervasive throughout statistics.
Variables such as SAT scores and heights of US adult
males and females closely follow the normal
distribution.
11
The Normal Distribution
Technically, while many variables are nearly normal,
none are exactly normal.
However, the normal distribution, while not perfect for
any single problem, is very useful for a variety of
problems. In fact, we will use it many times for the rest
of the course.
12
What Is Normal?
The normal distribution model always describes a
symmetric, unimodal, bell-shaped curve. However,
these curves can look different depending on the
details of the model. Specifically, the normal
distribution model can be adjusted using two
parameters: mean and standard deviation.
13
What Is Normal?
As you might guess, changing the mean shifts the bell
curve to the left or right, while changing the standard
deviation stretches or constricts the curve.
14
Mean = 0
Standard Deviation = 1
Mean = 19
Standard Deviation = 4
Example 1
Consider the following sets of three distributions,
all of which are drawn to the same scale. Identify
the two distributions that are normal. Of the two
normal distributions, which one has the larger
standard deviation?
15
Notation
Because the mean and standard deviation describe a
normal distribution exactly, they are called the
distribution's parameters. These are not the same
things as sample statistics.
Sample
Statistic
Distribution
Parameter
Mean (x-bar) μ (mu)
Standard Deviation s σ (sigma)
16
x
Notation
If X is a quantity to be measured that has a normal
distribution with mean μ and standard deviation σ,
we write the distribution as N(μ,σ)
17
Notation
For example, the two distributions below can be
written as
N(μ = 0, σ = 1) and N(μ = 19, σ = 4)
18
Mean = 0
Standard Deviation = 1
Mean = 19
Standard Deviation = 4
Example 2
Write down each normal distribution using the
short-hand notation, and sketch its shape.
a) mean 5 and standard deviation 3
b) mean –100 and standard deviation 10
c) mean 2 and variance 9.
19
z-Scores
The z-score of an observation is the number of
standard deviations it falls above or below the
mean. We compute the z-score for an observation
x that follows a distribution with mean μ and
standard deviation σ using
20
value mean
standard deviation
xz
z-Scores
If X is normal, then the z-scores (also called
standard scores) will also be normal, but with a
mean of 0 and standard deviation of 1. That is,
N(μ = 0, σ = 1).
This distribution even has a special name: the
standard normal distribution.
21
Example 3
Suppose a student had taken 2 exams, getting 60
in a verbal test and 80 in a numerical reasoning
test. The class scores for each exam are normally
distributed. For the verbal test, the mean is 50 and
standard deviation 10; for the numerical test, the
mean is 70 and standard deviation is 12. Relative
to the rest of the class, which was the student's
best score?
22
Example 4
Over the last few classes, we have run simulations on
several case studies, including the opportunity cost
study (Lesson 13) and the medical consultant study
(Lesson 16).
Suppose we had run 10,000 simulations on each
study.
23
Example 4 continued
The two graphs show the
null distribution for both of
these case studies.
Using these graphs,
describe the shape of the
distributions and note
anything that you find
interesting.
24
Central Limit Theorem
It is common for distributions in general to be
skewed or contain outliers.
However, the null distributions we've so far
encountered have all looked somewhat similar
and, for the most part, symmetric. They all
resemble the normal distribution. This is not a
coincidence, but rather, is guaranteed by
mathematical theory.
25
Central Limit Theorem
If we look at a proportion (or difference in proportions)
and the scenario satisfies certain conditions, then the
sample proportion (or difference in proportions) will
appear to follow a bell-shaped curve called the normal
distribution.
Though the conditions are slightly different, the sample
mean (or difference in means) will also appear to
follow a normal distribution. However, we'll save the
details for later in the course (Lessons 27 – 30).
26
Conditions for Proportions
Mathematical theory guarantees that a sample
proportion or a difference in sample proportions will
follow something that resembles a normal distribution
when certain conditions are met. These conditions fall
into two categories:
• Observations in the sample are independent.
• The sample is large enough.
27
Conditions for Proportions
Observations in the sample are independent.
Independence is guaranteed when we take a random
sample of less than 10% of the population. It can also
be guaranteed if we randomly divide individuals into
treatment and control groups.
The sample is large enough. To be reasonably
certain of a unimodal, symmetric distribution, the
sample should be at least a minimum size, though
what qualifies as "minimum" differs from one context to
the next. Suitable guidelines will be given in later
lessons.
28
Example 5
Suppose the true population proportion were p = 0.95.
The figure shows what the distribution of a sample
proportion looks like when the sample size is n = 20,
n = 100, and n = 500.
29
n = 20
n = 500
n = 100
Example 5 continued
What does each point (observation) in each of the
samples represent? Describe how the distribution of
the sample proportion, , changes as n becomes
larger.
30
p̂
n = 20
n = 500
n = 100
The Normal Distribution
So far we've had no need for the normal distribution.
We've been able to answer our questions somewhat
easily using simulation techniques.
This will soon change, however, since simulating data
can be non-trivial (very, very difficult).
Instead, the normal distribution (and other distributions
like it) offer a general framework that applies to a very
large number of settings.
31
Opportunity Cost
For one example, the opportunity cost study
determined that students are thriftier if they are
reminded that saving money now means they can
spend the money later. The study's point estimate
for the estimated impact was 20%, meaning 20%
fewer students would move forward with a DVD
purchase in the study scenario. However, as
we've learned, point estimates aren't perfect – they
only provide an approximation of the truth.
32
Opportunity Cost
It would be useful if we could provide a range of
plausible values for the impact, more formally
known as a confidence interval. It is often
difficult to construct a reliable confidence interval in
many situations using simulations. However,
doing so is reasonably straightforward using the
normal distribution (Lesson 21).
33
top related