chapter 11
Post on 03-Nov-2014
82 Views
Preview:
DESCRIPTION
TRANSCRIPT
To Design a Sample for Survey or Other Quantitative Research …
Specify the population Humans are but one type of population
Locate a sampling frame Populations are rarely directly accessible; the frame is
what actually leads to contacts Select a sampling procedure
For quantitative research, it will generally be best to seek a probability sample
Qualitative research will mostly use judgment and quota samples
Determine the needed sample size By mathematical formula or rule of thumb
Select the elements to be sampled By objective or subjective procedure, depending
Collect the data
Types of Sampling Procedures
Probability samples Each population member has a calculable
probability of appearing in the sample Simple random samples Stratified random samples Systematic samples
Non-probability samples No way to know the probability that any given
population element will enter the sample Convenience samples Quota samples Judgment samples
Observations on Probability Samples
It isn’t a simple random sample unless All population members can be identified A random number table or equivalent is used
If a random sample will be drawn from a list, a systematic sample could probably be drawn Generally easier May even be more representative
In small samples, esp. if population has small numbers of certain types of elements
or if list can be ordered on a factor of interest
If a list is not available, random sampling may still be possible, but cost and difficulty may escalate
Observations on Probability Samples II
Stratified samples improve the efficiency of estimates Recommended when meaningfully different
sub-populations can be identified Still must be able to list/identify (sub)
population members
Summary: Probability Samples Differences among probability methods less
important than the trait they share Logic: known probability of entering the sample Mathematical estimation of required sample size
possible Systematic sampling probably more common
than true random samples If the list is large enough and heterogeneous enough to
lack structure, or the structure can be identified, generally an effective substitute for random sampling
Stratified samples acknowledge ‘lumpiness’ in the population, & achieve cost or accuracy advantages thereby
Computing Sample Size
A computation of sample size requires only: Required degree of precision Desired level of confidence Estimate of population variability
Precision
How much are you willing to be wrong by? Or, what’s the largest gap between obtained
result and true result that you can tolerate? Examples
“Results are accurate to within ± .05” So, “the research showed that 32% of customers had
experienced system downtime within the last 6 months” indicates that the true number is expected to be between 27% and 37%
“results are accurate to within $100” So, “the research showed that customers saved an
average of $768 with CarsDirect.com” indicates that the true average savings can be expected to fall between $668 and $868
Confidence Level or Interval
In the statistical sense, I.e., a 95% confidence level or a 90% or a 99%, to use the most common
values “Expected to be”, on the previous slide,
was a fudge ‘Precision’ means ‘precision at a given
confidence level’ ‘Precision of ± .05 [at a 95% confidence level]’
indicates that if the sample were drawn an infinite number of times, in 95% of these samples the true value would be within .05 either side of the sample estimate [here, 32%]
Population Variability
Defined in statistical terms, I.e., as the variance When working with means, the variance is equal to the
average of the squared deviations of each population element from the population mean
A different formula is used for percentages, but the same statistical concept applies
The greater the variability in the population, the larger the sample size needed to obtain a given level of precision at a specified confidence level
Keep in mind: the ‘standard deviation’ is the square root of the variance
The Formula
The number of sample elements (sample size) required to achieve a given precision at a specified confidence is Z2 * S2 / D2
Where Z is a function of confidence interval
95% ~ 2 99% ~ 2.6
S2 is the estimated population variance D is the precision
I.e., if ‘± .05’, or ‘accurate to within 5 percentage points’, then D = .05
Computing Sample Size for Proportions
Example: Customers appear to be split on an issue; want to know whether a majority (> 50%) agree with a certain view. Precision, or acceptable error, is .05
So, .05 = D
Confidence level is 95% For proportions, population variability (S2) is strictly a
function of the proportions themselves: S2 = (proportion #1) * (1 - proportion #1)
Sample size formula is: Z2 * S2 / (precision)2
4 * .25 / .0025 = 400
Effect on Sample Size of Changes in Variability and Desired Confidence
Variability: want to confirm that a certain view is held by 80% of customers S2 now = .8 * .2 = .16 Confidence and precision are unchanged, Z still = 2 and D
still = .05 Sample size needed now is
4 * .16 / .0025 = 256
Suppose we also relax our confidence requirements (to 90%) Might be appropriate, if the question has changed from
‘what is percentage?’ to ‘agreement is 80% or more, true?’ Z for 90% interval is ~ 1.60 Needed sample size is now
(1.6)2 * .16 / .0025 = 164
Effect on Sample Size of Changes in Precision
Now want a precision of .03 Revert to assumed 50/50 split and 95%
confidence 4 * (.50 * (1 - .50)) / .0009 = 1111 Results would now be reported as ’50%, ± .03’
How about a precision of .015: 4* .25 / .000225 = 4445
BTW, if many elections are decided by margins such as 50.6% to 49.4% … how are we to regard polling results where the
margin of error is .03?
Reminder
We never said anything about the size of the population that we were sampling from We simply assumed that the population was ‘large’
Thus, if we were estimating support for a presidential candidate (% in favor), sample size would not be affected if we defined the voter population as San Francisco California The United States
However, if we were examining ‘the faculty at SCU’ (~400), a ‘finite population correction factor’ might enter into the formula
Sample Size for Estimating Means Remember that population variability for means is average
squared deviation from mean: Population of 4 with incomes of $10,000, $8000, $4000, and
$2000 Mean is $6000; variance is (40002 + 20002 + 20002 + 40002)/4 = $40,000,000/4 =
$10,000,000 Watch what happens as we add individuals close to the mean
Add 2 individuals with incomes of $7000 and $5000 and mean stays the same but variance drops to $42,000,000/6, or $7,000,000
A normal distribution means we have lots of observations like $5000 and $7000, so that the variance is reduced, relative to the more rectangular distribution with which we began
Thus, the shape of the distribution of the population elements matters, because variance varies with distribution shape, and variance is a key input to computing sample size
Sample Size for Estimating Means II
So, let’s assume that in a city the variance in income is 225,000,000 How might we get this number?
corresponds to a standard deviation of $15,000 which we might expect if most people had incomes
between $10,000 and $100,000 Because, ($100,000 – $10,000)/6 = $15,000 [99% of normally distributed population elements will fall
within 3 standard deviations either side of the mean; the standard deviation is the square root of the variance]
We desire a sample error of $1000 in estimating the income of some group of respondents.
4 * 225,000,000 / 10002 = sample of 900
Rules of Thumb If you want to draw multiple conclusions about a
group of people, include 100+ At least, percentages and raw numbers correspond here This also corresponds to a desired error of .10 for 50/50
proportions For means, may estimate the population variance
as ([range, min to max]/ 6) 2
Minimum and maximum values may often be guesstimated
All bets are off if the population is not distributed approximately normal But, can guesstimate a larger variance if distribution is
rectangular If actual variance is less than estimated, than sample
gives more (tighter) precision than required Not a bad thing, except for lack of cost efficiency
Rules of Thumb 2 For rating scales, typical variances have been tabled
(see McQuarrie text). Need only decide whether distribution is relatively flat (typical
of 5 pt agree/disagree scales and 4 pt high/low & often/never & excellent/poor scales), or relatively humped/peaked (typical of 7pt semantic differential scales rating intangibles)
If humped, use the lower value in the table If flat, use the higher value
Historical information on variances in past studies, or from secondary sources (US census), is often available.
Pilot studies, if the research is important enough, can clear up many confusions
Sampling Summary
Specifying populations and discovering a frame is an art, especially when sampling something other than people from a standard list So also with deciding among simple random sampling,
systematic sampling, stratified sampling, etc. By contrast, setting a sample size is surprisingly
straightforward in many cases Must know the formulae and grasp the logic Ad hoc rules and historical practice solve many
problems High stakes problems, unusual sampling plans, and
specialized analytic strategies may call for a professional statistician
Expectation
That you know and can use the basic formulae for determining sample size for proportions and means
That you will select an appropriate sample of the right size, using an appropriate sampling procedure, for your client, in Part II.
That you will be appropriately skeptical of sampling plans found in cases (that you encounter as a student) and in secondary research (that you encounter as a manager)
top related