collecting data understanding random sampling. objectives: to develop the basic properties of...
TRANSCRIPT
Collecting Data
Understanding Random Sampling
Objectives: To develop the basic properties of
collecting an unbiased sample. To learn to recognize flaws in biased
sampling.
Intro…
Do you know what it
means when something
occurs randomly?
Randomly select a number
from the next slide. Ready…
1 2 3 4
Question:
What would you except
to happen if when we
collected data on this
simple task?
How do we gather data?
Surveys Opinion pollsInterviewsStudies
ObservationalRetrospective (past)Prospective (future)
Experiments
PopulationPopulation – the entire group
of individuals we want information about.
Census – a complete count of the entire population
Why would we not use a census all the time?
1) Not accurate
2) Very expensive
3) Perhaps impossible
4) If using destructive sampling, you would destroy population
• Breaking strength of soda bottles• Lifetime of flashlight batteries• Safety ratings for cars
SampleA part of the population that we examine in order to gather information
Used to generalize information about a population
Sampling designrefers to the method used to choose
the sample from the population
Sampling framea list of every individual in the population
consist of n individuals from the population chosen in such a way thatevery individual has an equal
chance of being selectedevery set of n individuals has an
equal chance of being selected
Simple Random Sample (SRS)
SRSAdvantages
UnbiasedEasy
DisadvantagesLarge varianceMay not be
representativeMust have
sampling frame (list of population)
Systematic random sample
select sample by following a systematic approach
randomly select where to begin
Systematic Random Sample
AdvantagesUnbiasedEnsure that the
sample is distributed across population
More efficient, cheaper, etc.
DisadvantagesLarge varianceCan be
confounded by trend or cycle
Formulas are complicated
A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10th customer after them, to fill it out before they leave.
Identify the sampling design
Systematic random sampling
BiasERRORfavors certain outcomes
Note: We cannot ever draw conclusions from bias data. Throw it out and start over!
Voluntary responsePeople chose to respond Usually only people with very strong opinions respond
Produces biased results
Convenience samplingAsk people who are easy to ask
Produces bias results
Suppose that you want to estimate the total amount of money spent by students on textbooks each semester at Rice. You collect register receipts for students as they leave the bookstore during lunch one day.
Source of bias?
Convenience sampling – easy way to collect data
1970 Draft Lottery and the Role of Randomization
In that first draft lottery (conducted on December 1, 1969), a large, deep, cylindrical bowl was filled with 366 dates, one for each day of the year (including February 29, of course). The dates were placed inside small capsules (balls about the size of a pecan), added to the bowl, and then mixed. After mixing, the capsules were selected, one by one, and assigned a draft priority. Draft registrants whose birthdays matched the first 100 or so dates selected were likely to be called for induction. However, the bowl's small diameter and height (nearly arm's length) made the mixing less than random because each month's dates had been added sequentially in the yearly order of months.
January's capsules were dumped in first, followed by February's and so on until December.
Set of Data for 1970 Draft Lottery
1970 Draft Lottery
DayOfYear DraftNo correlation = -0.197831
DraftNo = -0.197DayOfYear + 220;
0
100200
300
400
0 100 200 300 400DayOfYear
draft_70 Scatter Plot
1970 Draft Number by Day of Year
Mean_Draft_No = -7.06Month_No + 230; r2 = 0.75
120
140
160
180
200
220
240
2 4 6 8 10 12 14Month_No
draft_70 Scatter Plot
Mean Draft Number by Month
How did the nonrandomness of the draft effect the casualties (deaths) during the Vietnam war?
This was recently studied by Paul Sommers in "The Writing on the Wall", Chance, Vol, 1, 2003, p35-38.
He examined the names of the casualties on the Vietnam Memorial (available online at thewall-usa.com) together with other sources and found the number of casualties by birth month:
Selecting a SRS For the AP exam: “Knowledgeable
users of statistics need to be able to perform your sample exactly using the described method.”
Methods: we can “pick samples from a hat”, use a random number generator, or use a table of random digits to derive our sample
SRS by picking out of a hat Say items in hat are “mixed thoroughly” and
state whether or not slips of paper are replaced back in the hat (yes if stratified sampling).
Random digit tableeach entry is equally likely to be any of the 10 digits
digits are independent of each other
Suppose your population consisted of these 20 people:
1) Aidan 6) Fred 11) Kathy 16) Paul2) Bob 7) Gloria 12) Lori 17) Shawnie3) Chico 8) Hannah 13) Matthew 18) Tracy4) Doug 9) Israel 14) Nancy 19) Uncle Sam5) Edward 10) Jung 15) Opus 20) Vernon
Use the following random digits to select a sample of five from these people.
We will need to use double digit random numbers,
ignoring any number greater than 20. Start with Row 1
and read across.
We will need to use double digit random numbers,
ignoring any number greater than 20. Start with Row 1
and read across.
Row1 4 5 1 8 0 5 1 3 7 12 0 1 5 5 8 0 1 5 7 03 8 9 9 3 4 3 5 0 6 3
Row1 4 5 1 8 0 5 1 3 7 12 0 1 5 5 8 0 1 5 7 03 8 9 9 3 4 3 5 0 6 3
Ignore.Ignore.
18) Tracy
5) Edward
13) Matthew
1) Aidan
15) Opus
Ignore.Ignore.Ignore.Ignore.Ignore.Ignore.
Stop when five people are selected. So my sample would consist of :
Aidan, Edward, Matthew, Opus, and Tracy
Stop when five people are selected. So my sample would consist of :
Aidan, Edward, Matthew, Opus, and Tracy