survey methodology epid 626 sampling, part ii manya magnus, ph.d. fall 2001

52
Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Upload: grant-gilmore

Post on 13-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Survey MethodologyEPID 626

Sampling, Part IIManya Magnus, Ph.D.

Fall 2001

Page 2: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Lecture overview

• Comments about Assignment I

• More sampling techniques

• Sampling error

• Sample sizes

Page 3: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Comments about Assignment I• Late policy• Location of mailbox• Randomization vs. random selection• Validity, reliability• Sampling frames• Physician responses=?=“gold standard”• Research questions vs. survey

questions• Registering for class

Page 4: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Comments about Assignment I

• Grading

Looked for completeness in answering questions, care in discussion of survey, effort, basically correct information, not just cut-n-paste, synthesis.

• Questions about grade: email [email protected]

Page 5: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Comments about Assignment I

• Grading:– ++ 90-100%– + 80-89% 70-79%– - 60-69%– -- <60%– 0 not turned in

Page 6: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Random digit dialing (1)• Delineate the geographic boundaries of the

sampling area• Identify all of the exchanges used in the

geographic area• Identify the distribution of prefixes with the

sampling area– Example: There may be 8 exchanges, but you

may find that 3 of them are used for nearly two-thirds of residential lines.

Page 7: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Random digit dialing (2)• You may stratify based on the

distribution of prefixes– Ex. Take more samples of the 3 exchanges

that account for the most residential lines

• Try to identify vacuous suffixes– These are suffixes not yet assigned or

assigned in large groups to a business– Usually consider suffixes in 100s

• ex. 0000-0099, 0100-0199

Page 8: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Random digit dialing (3)

• May randomly select the four-digit suffixes – ex. use a random-numbers table

• Alternatively, you may use a plus-one approach– When you reach residence, use the

number as a seed, and add fixed digits (one or two) to get the next sample

Page 9: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Random digit dialing (4)• Provides a nonzero chance of reaching any

household within a sampling area that has a telephone line regardless of whether the number is listed

• Is the probability of reaching every household equal?– No. Households with more than one phone line will

have a greater probability than households with one phone line.

– Adjust for unequal probability by weighting

Page 10: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Random Digit Dialing (5)

• Advantages: Inexpensive and easy to do

• Disadvantages: 1. Large number of unfruitful calls2. Will exclude individuals without phones3. May be difficult to ascertain geographic area

Page 11: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sampling distributions

• The central limit theorem: In a sequence of samples of a population, for a particular estimate (say a mean), there will be a normal distribution around the true population value

• As sample size increases, distribution becomes increasingly normal

Page 12: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

• This variation around the true value is the sampling error—it stems from the fact that, by chance, samples may differ from the population as a whole.

Page 13: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

• The larger the sample size and the less variance of what is being measured, the more tightly the sample estimates will “bunch” around the true population value, and the more accurate the sample-based estimate will be.

Page 14: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Page 15: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Page 16: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Page 17: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001
Page 18: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001
Page 19: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001
Page 20: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001
Page 21: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Example (1) (adapted from Babbie)

• Survey at TUSPHTM• Approval of new Lundi Gras holiday• Dichotomous outcome:

approve/disapprove• Survey population—aggregation of

students• Sampling frame—student list• Random sample of students;

representative sample of student body

Page 22: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Example (2) (adapted from Babbie)

• Extremes and all combinations in between possible: 100% approve100% disapprove, 1% approve, 99% disapprove, etc..

• First random sample: 48% approve, 52% disapprove

• Second random sample: 20% approve, 80% disapprove

• And so forth

Page 23: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Example (3) (adapted from Babbie)

• What results from this exercise, is a distribution of samples, or a sampling distribution.

• As more independent random samples are selected, the sample statistics obtained will be distributed around true population value in a known way.

Page 24: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Example (4) (adapted from Babbie)

• They will be clustered about the true value within a certain range.

• The range is given by the standard error.• We do not know if the value in our sample

is within the range, just that if many similar samples were taken in the same fashion, X% would fall within the specified range; this one may or may not.

Page 25: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Example (5) (adapted from Babbie)

• Probability theory says that 68% of samples will fall within one standard deviation of the parameter and 95% will fall within two standard deviations of the parameter

• Increasing confidence with increasing range

Page 26: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

• Note difference between standard errors & standard deviations

Page 27: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Standard error of a mean

n

VarSE

Page 28: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Standard error of a mean

• The standard deviation of the distribution of sample estimates of the mean that would be formed if an infinite number of samples of a given size were drawn.

Page 29: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Proportions

• Mean of a two-value (binomial) distribution

• Var of a proportion = p(1-p)

• So the

n

ppSE

)1(

Page 30: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Table 2.1Confidence Ranges for Variability

Attributable to Sampling

• Trends

• If sample size=75 and p=0.20,

)29.0,11.0(%95

9092.02*)046188.0(

046188.075

16.0

75

)80.0)(20.0(

CI

SE

Page 31: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Confidence intervals

• In a survey of 100 respondents, 20% say yes. What is the confidence interval for a 95% confidence level?

• In a survey 250 respondents, 10% say yes. What is the confidence interval for a 95% confidence level? What if 50% said yes?

Page 32: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

• In a survey of 100 respondents, 20% say yes. What is the confidence interval for a 95% confidence level?

• Interval is 8.

• 95% CI=(12%, 28%)

Page 33: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

• In a survey 250 respondents, 10% say yes. What is the confidence interval for a 95% confidence level? What if 50% said yes?

• Interval is about 3.8.• 95% CI is about (6.2%, 13.8%)• If 50% said yes, CI is about

(43.7%, 56.3%)

Page 34: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sampling error and sampling strategy

• SRS is approximated by the standard error• Systematic sampling

– If not stratified, sampling error is the same as in SRS.

– If stratified, errors are lower than those associated with SRS for the same size for variables that differ (on average) by stratum, if rates of selection are constant across strata.

Page 35: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sampling error and sampling strategy (2)

• Unequal rates of selection decrease sampling error for oversampled groups.

• It will generally produce sampling errors for the whole sample that are higher than those associated with SRS of the same size for variables that differ by stratum.

Page 36: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sampling error and sampling strategy (3)

• Clusters will produce sampling errors that are higher than SRS for the same size for variables that are more homogenous within clusters than in the population as a whole.

• You must look at the nature of the clusters to evaluate the effect on the sampling error.

Page 37: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Caveats

• Sampling error is in no way the only source of error.

• Non-sampling error, bias, error resulting from incorrect specification of sampling frame, etc., etc., are also sources of error.

• Often the latter are more insidious as they are seldom quantifiable

• Total survey approach useful in this regard.

Page 38: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sample size (1)

• Very important to consider prior to undertaking study

• Consult a biostatistician

• Many references in texts, available spreadsheet, stat programs, EpiInfo, etc.

• Never feel bad asking for assistance

Page 39: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sample size (2)• What not to do

1. Sample size does not rely on the fraction of the population that is sampled. Nor does it depend on the size of the population you want to describe.

2. Sample size should not be decided solely based on what others have previously done.

3. Sample size should not be based on the desired level of precision for just one estimate.

Page 40: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sample size (3)

• What to do– develop analysis plan– desired precision of estimates for

subgroups, – consider research questions– affordability, – feasibility, – and to some extent, previous studies

Page 41: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sample size (5)

• Parameters required to calculate sample size:– Null hypothesis—what precisely are you

asking/testing? [Pr(type I error)] [Pr(type II error)]—usually included as 1-

=power– What difference between groups do you want to

observe? (e.g., 1- 2)

– What is a good estimate of variance in population?

Page 42: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sample size (6)

• How sample size works—some examples

Page 43: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sample size (7) sample size, power

Group AGroup B

Page 44: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sample size (8) sample size, power

A:

B:

Page 45: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sample size (9) variability, power

A:

B:

Page 46: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Sample size (10) variability, power

A:

B:

Page 47: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Non-response (1)

• Very big issue

• Source of non-sampling error

• Can lead to bias, uninterpretability of results

• Violates whole point of probability sample, yet unavoidable

Page 48: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Non-response (2)

• Issue in probability as well as non-probability samples

• Exists on many levels

Page 49: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Non-response (3)

Whole sample

Reached Not reached

Page 50: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Non-response (4)

Reached

Can participate

Cannot participate

Page 51: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Non-response (5)

Reached

Enrolled Refused

Page 52: Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Non-response (6)

Participated

Answer individual question

Did not answer

individual question