chapter 11

20
Survey or Other Quantitative Research … Specify the population Humans are but one type of population Locate a sampling frame Populations are rarely directly accessible; the frame is what actually leads to contacts Select a sampling procedure For quantitative research, it will generally be best to seek a probability sample Qualitative research will mostly use judgment and quota samples Determine the needed sample size By mathematical formula or rule of thumb Select the elements to be sampled By objective or subjective procedure, depending Collect the data

Upload: cheryl-lawson

Post on 03-Nov-2014

81 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Chapter 11

To Design a Sample for Survey or Other Quantitative Research …

Specify the population Humans are but one type of population

Locate a sampling frame Populations are rarely directly accessible; the frame is

what actually leads to contacts Select a sampling procedure

For quantitative research, it will generally be best to seek a probability sample

Qualitative research will mostly use judgment and quota samples

Determine the needed sample size By mathematical formula or rule of thumb

Select the elements to be sampled By objective or subjective procedure, depending

Collect the data

Page 2: Chapter 11

Types of Sampling Procedures

Probability samples Each population member has a calculable

probability of appearing in the sample Simple random samples Stratified random samples Systematic samples

Non-probability samples No way to know the probability that any given

population element will enter the sample Convenience samples Quota samples Judgment samples

Page 3: Chapter 11

Observations on Probability Samples

It isn’t a simple random sample unless All population members can be identified A random number table or equivalent is used

If a random sample will be drawn from a list, a systematic sample could probably be drawn Generally easier May even be more representative

In small samples, esp. if population has small numbers of certain types of elements

or if list can be ordered on a factor of interest

If a list is not available, random sampling may still be possible, but cost and difficulty may escalate

Page 4: Chapter 11

Observations on Probability Samples II

Stratified samples improve the efficiency of estimates Recommended when meaningfully different

sub-populations can be identified Still must be able to list/identify (sub)

population members

Page 5: Chapter 11

Summary: Probability Samples Differences among probability methods less

important than the trait they share Logic: known probability of entering the sample Mathematical estimation of required sample size

possible Systematic sampling probably more common

than true random samples If the list is large enough and heterogeneous enough to

lack structure, or the structure can be identified, generally an effective substitute for random sampling

Stratified samples acknowledge ‘lumpiness’ in the population, & achieve cost or accuracy advantages thereby

Page 6: Chapter 11

Computing Sample Size

A computation of sample size requires only: Required degree of precision Desired level of confidence Estimate of population variability

Page 7: Chapter 11

Precision

How much are you willing to be wrong by? Or, what’s the largest gap between obtained

result and true result that you can tolerate? Examples

“Results are accurate to within ± .05” So, “the research showed that 32% of customers had

experienced system downtime within the last 6 months” indicates that the true number is expected to be between 27% and 37%

“results are accurate to within $100” So, “the research showed that customers saved an

average of $768 with CarsDirect.com” indicates that the true average savings can be expected to fall between $668 and $868

Page 8: Chapter 11

Confidence Level or Interval

In the statistical sense, I.e., a 95% confidence level or a 90% or a 99%, to use the most common

values “Expected to be”, on the previous slide,

was a fudge ‘Precision’ means ‘precision at a given

confidence level’ ‘Precision of ± .05 [at a 95% confidence level]’

indicates that if the sample were drawn an infinite number of times, in 95% of these samples the true value would be within .05 either side of the sample estimate [here, 32%]

Page 9: Chapter 11

Population Variability

Defined in statistical terms, I.e., as the variance When working with means, the variance is equal to the

average of the squared deviations of each population element from the population mean

A different formula is used for percentages, but the same statistical concept applies

The greater the variability in the population, the larger the sample size needed to obtain a given level of precision at a specified confidence level

Keep in mind: the ‘standard deviation’ is the square root of the variance

Page 10: Chapter 11

The Formula

The number of sample elements (sample size) required to achieve a given precision at a specified confidence is Z2 * S2 / D2

Where Z is a function of confidence interval

95% ~ 2 99% ~ 2.6

S2 is the estimated population variance D is the precision

I.e., if ‘± .05’, or ‘accurate to within 5 percentage points’, then D = .05

Page 11: Chapter 11

Computing Sample Size for Proportions

Example: Customers appear to be split on an issue; want to know whether a majority (> 50%) agree with a certain view. Precision, or acceptable error, is .05

So, .05 = D

Confidence level is 95% For proportions, population variability (S2) is strictly a

function of the proportions themselves: S2 = (proportion #1) * (1 - proportion #1)

Sample size formula is: Z2 * S2 / (precision)2

4 * .25 / .0025 = 400

Page 12: Chapter 11

Effect on Sample Size of Changes in Variability and Desired Confidence

Variability: want to confirm that a certain view is held by 80% of customers S2 now = .8 * .2 = .16 Confidence and precision are unchanged, Z still = 2 and D

still = .05 Sample size needed now is

4 * .16 / .0025 = 256

Suppose we also relax our confidence requirements (to 90%) Might be appropriate, if the question has changed from

‘what is percentage?’ to ‘agreement is 80% or more, true?’ Z for 90% interval is ~ 1.60 Needed sample size is now

(1.6)2 * .16 / .0025 = 164

Page 13: Chapter 11

Effect on Sample Size of Changes in Precision

Now want a precision of .03 Revert to assumed 50/50 split and 95%

confidence 4 * (.50 * (1 - .50)) / .0009 = 1111 Results would now be reported as ’50%, ± .03’

How about a precision of .015: 4* .25 / .000225 = 4445

BTW, if many elections are decided by margins such as 50.6% to 49.4% … how are we to regard polling results where the

margin of error is .03?

Page 14: Chapter 11

Reminder

We never said anything about the size of the population that we were sampling from We simply assumed that the population was ‘large’

Thus, if we were estimating support for a presidential candidate (% in favor), sample size would not be affected if we defined the voter population as San Francisco California The United States

However, if we were examining ‘the faculty at SCU’ (~400), a ‘finite population correction factor’ might enter into the formula

Page 15: Chapter 11

Sample Size for Estimating Means Remember that population variability for means is average

squared deviation from mean: Population of 4 with incomes of $10,000, $8000, $4000, and

$2000 Mean is $6000; variance is (40002 + 20002 + 20002 + 40002)/4 = $40,000,000/4 =

$10,000,000 Watch what happens as we add individuals close to the mean

Add 2 individuals with incomes of $7000 and $5000 and mean stays the same but variance drops to $42,000,000/6, or $7,000,000

A normal distribution means we have lots of observations like $5000 and $7000, so that the variance is reduced, relative to the more rectangular distribution with which we began

Thus, the shape of the distribution of the population elements matters, because variance varies with distribution shape, and variance is a key input to computing sample size

Page 16: Chapter 11

Sample Size for Estimating Means II

So, let’s assume that in a city the variance in income is 225,000,000 How might we get this number?

corresponds to a standard deviation of $15,000 which we might expect if most people had incomes

between $10,000 and $100,000 Because, ($100,000 – $10,000)/6 = $15,000 [99% of normally distributed population elements will fall

within 3 standard deviations either side of the mean; the standard deviation is the square root of the variance]

We desire a sample error of $1000 in estimating the income of some group of respondents.

4 * 225,000,000 / 10002 = sample of 900

Page 17: Chapter 11

Rules of Thumb If you want to draw multiple conclusions about a

group of people, include 100+ At least, percentages and raw numbers correspond here This also corresponds to a desired error of .10 for 50/50

proportions For means, may estimate the population variance

as ([range, min to max]/ 6) 2

Minimum and maximum values may often be guesstimated

All bets are off if the population is not distributed approximately normal But, can guesstimate a larger variance if distribution is

rectangular If actual variance is less than estimated, than sample

gives more (tighter) precision than required Not a bad thing, except for lack of cost efficiency

Page 18: Chapter 11

Rules of Thumb 2 For rating scales, typical variances have been tabled

(see McQuarrie text). Need only decide whether distribution is relatively flat (typical

of 5 pt agree/disagree scales and 4 pt high/low & often/never & excellent/poor scales), or relatively humped/peaked (typical of 7pt semantic differential scales rating intangibles)

If humped, use the lower value in the table If flat, use the higher value

Historical information on variances in past studies, or from secondary sources (US census), is often available.

Pilot studies, if the research is important enough, can clear up many confusions

Page 19: Chapter 11

Sampling Summary

Specifying populations and discovering a frame is an art, especially when sampling something other than people from a standard list So also with deciding among simple random sampling,

systematic sampling, stratified sampling, etc. By contrast, setting a sample size is surprisingly

straightforward in many cases Must know the formulae and grasp the logic Ad hoc rules and historical practice solve many

problems High stakes problems, unusual sampling plans, and

specialized analytic strategies may call for a professional statistician

Page 20: Chapter 11

Expectation

That you know and can use the basic formulae for determining sample size for proportions and means

That you will select an appropriate sample of the right size, using an appropriate sampling procedure, for your client, in Part II.

That you will be appropriately skeptical of sampling plans found in cases (that you encounter as a student) and in secondary research (that you encounter as a manager)