Chapter 7: Data for Decisions
Lesson Plan
Sampling
Bad Sampling Methods
Simple Random Samples
Cautions About Sample Surveys
Experiments
Experiments Versus Observational
Studies
Inference: From Sample to Population
Confidence Intervals
Mathematical Literacy in Today’s World, 9th ed.
For All Practical Purposes
© 2013 W. H. Freeman and Company
Chapter 7: Data for Decisions
Sampling
Statistics The science of collecting, organizing, and interpreting data.
How is the data produced? Sampling and experiments.
Sampling Gather information about a large group of individuals.
Time, cost, and inconvenience forbid contacting every individual.
Instead, gather information about only part of the group in order to draw conclusions about the whole.
Population – The entire
group of individuals about
whom we want information.
Sample – Part of the population from
which we actually collect information
used to draw conclusions about the
whole.
Chapter 7: Data for Decisions
Bad Sampling Methods
Bad Sampling Methods
If personal choice is involved in selecting the sample, the following
could happen:
Results could become biased.
The sample may not be a true representation of the population.
Bias – The design
of a statistical
study that
systematically
favors a certain
outcome.
1. Convenience Samples
Interviewer chooses the sample from individuals close at hand (easiest to reach).
Example: Mall surveys
2. Voluntary Response Sample
People who choose themselves by responding to a general appeal.
People with strong opinions are most likely to respond; can cause bias.
Examples: Online polls, call-ins, write-ins.
Chapter 7: Data for Decisions
Simple Random Samples
Simple Random Sample (SRS)
An SRS of size n consists of n individuals from the population
chosen in such a way that every set of n individuals has an equal
chance to be the sample actually selected.
Choosing a sample by chance avoids bias by giving all individuals
an equal chance to be chosen (a good sampling method).
Examples of SRS
Draw names from a hat: Place all the names of the people in the
population into a hat and draw out a handful (the sample).
Slow and inconvenient
Use the table of random digits: A more efficient way of randomly
selecting the sample without bias.
For smaller samples, tables of random digits are used.
For larger samples, computers do the random digit sampling.
Chapter 7: Data for Decisions
Simple Random Sample
Two Steps in Choosing a Simple Random Sample 1. Give each member of the population a numerical label of the
same length.
Example: 100 items can be labeled with two digits 01, 02, …, 99, 00
2. To choose the random sample, select a line in the digit table.
A table of random digits – A list
of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8,
9 with these two properties:
1. Each entry in the table is equally
likely to be any of the 10 digits
from 0 through 9 .
2. The entries are independent of
one another. That is, knowledge
of one part of the table gives no
information about the other part.
For a sample size of n, start reading off numbers of length of the labels until n individuals are selected from the population.
When selecting the n individuals for the sample from the random digits table:
1. Do not use any group of digits not used as a label.
2. Do not use any repeats.
Chapter 7: Data for Decisions
Simple Random Sample
Using the Random Digit Table
Example:
A group of 70 people were labeled 01, 02, 03, …, 69, 70.
In the random digits table, line 104 was selected and three lucky
winners were selected.
Reading off two-digit labels from line 104…
52 was selected first; 71 was skipped over (because it is not in the
range of labels); 13 was chosen; then 88, 89, and 93 were skipped
over (out of range); and 07 was chosen. Winners: 52 ,13, and 07
Chapter 7: Data for Decisions
Cautions About Sample Surveys
Sample surveys of large populations require the following:
A good sampling design (can be done with SRS)
An accurate and complete list of the population
Participation of all individuals selected for the sample
A question posed that is neutral and clear
Bias can occur due to the following:
Problems with obtaining an accurate and complete population list
Undercoverage – Occurs when some groups in the population are left
out of the process of choosing the sample.
Example: Homeless, prison inmates, students in dormitories, etc.
Problems with getting 100% participation of sampled people
Nonresponse – Occurs when an individual chosen for the sample
cannot be contacted or refuses to participate.
Problems with posing a misleading or confusing question
Chapter 7: Data for Decisions
Experiments
Observation versus Experiments Observational Study – Example: sample survey
Observes individuals and measures variable of interest but does not attempt to influence the response.
Purpose is to describe some group or situation.
Experiment
Deliberately imposes some treatment on individuals in order to observe their responses.
Purpose is to study whether the treatment causes a change in the response.
Chapter 7: Data for Decisions
Experiments
Examining Cause and Effect Between Variables Experiments are the preferred method for examining the effect of
one variable on another.
By imposing specific treatment of interest and controlling other influences, we can pin down cause and effect.
Uncontrolled Experiment
When it is not possible to control outside factors that can
influence the outcome.
Confounding – The variables, whether part of a study or not, are
said to be confounded when their effects on the outcome cannot
be distinguished from each other.
9
Chapter 7: Data for Decisions
Experiments
Example: GMAT Prep Class
A college only offers a GMAT
exam preparation course online,
whereas in the past it was only
offered live. The students who
take the online course score an
average of 10% higher on the
GMAT exam than those who took
the live course in the past. Can we
conclude that the online course is
more effective?
No, there may be confounding
variables.
10
Chapter 7: Data for Decisions
Experiments
Randomized Comparative Experiment (helps confounding)
The outside effects and confounding variables act on all groups.
An experiment to compare two or more treatments in which people,
animals, or things are assigned to treatments by chance.
Randomized – The subjects are assigned to treatments by chance.
Comparative – Compares two or more treatments.
Chapter 7: Data for Decisions
Experiments versus Observational Studies
Placebo Effect
The effect of a dummy treatment (such as an inert pill in a
medical experiment) on the response of the subjects.
The tendency to respond favorably to any treatment.
Double-Blind Experiments
An experiment in which neither the experimental subjects nor the
persons who interact with them know which treatment each
subject received. This helps to eliminate possible influences or
biases between the subjects and workers — everyone is kept
“blind.”
Observational Study
Does not try to manipulate the environment (such as assigning
treatments to people); it simply observes the measurements of
variables of interest that result from people’s free choices.
12
Chapter 7: Data for Decisions
Inference: From Sample to Population
Statistical Inference
When the sample was chosen at random from a population, we
can infer conclusions about the wider population from these data.
Statistical inference works only if the data comes from random
samples or a randomized comparative experiment.
Parameter is a number that describes the population.
A parameter is a fixed number (in practice we do not know its value).
A statistic is a number that describes a sample.
The value of a statistic is known when we have taken a sample, but it
can change from sample to sample.
Chapter 7: Data for Decisions
Inference: From Sample to Population
Example:
A random sample of 2500 people was chosen from the
population and asked a question: “Do you like getting new
clothes but find shopping for clothes frustrating and time
consuming?” 1650 people agreed.
Infer that 66% of the population agrees.
ˆ 1650/ 2500 0.66 66%p
14
Chapter 7: Data for Decisions
Inference: From Sample to Population
Sampling Distribution
The distribution of values taken by the statistic in all possible
samples of the same size from the same population.
For a fixed number of trials, a distribution with larger sample sizes
will have less variation and the values will lie closer to the mean.
Chapter 7: Data for Decisions
Inference: From Sample to Population
Sample Proportion Choose a SRS of size n from a large population that contains
population proportion p of successes. Then the sample proportion of successes is:
Then…
Shape: For large sample sizes , the sampling distribution of is approximately normal.
Center: The mean of the sampling distribution of is p.
Spread: The standard deviation of the sampling distribution is:
For the shopping example…
With a mean p = 0.6 and n = 2500, stand. dev. is
30n p̂
p̂
ˆcount of successes in the sample
pn
1p p
n
0.6 1 0.6.0098
2500
Chapter 7: Data for Decisions
Confidence Intervals
The 68-95-99.7 Rule 68% of the observations fall within ± 1 standard deviation of the mean.
95% of the observations fall within ± 2 standard deviations of the mean.
99.7% of the observations fall within ± 3 standard deviations of the mean.
95% Confidence Interval An interval obtained from the sample data by a method that in 95% of all
samples will produce an interval containing the true population parameter
A 95% confidence interval for p is approximately:
ˆ ˆ1
ˆ 2p p
pn