volunteer angler data collection and methods of inference kristen olson university of...
Post on 28-Dec-2015
216 Views
Preview:
TRANSCRIPT
1
Volunteer Angler Data Collection and Methods of Inference
Kristen OlsonUniversity of Nebraska-Lincoln
February 2, 2012
2
Two perspectives on survey statistics
• Survey quality framework (Biemer and Lyberg, 2003)– Adopted by many national statistical organizations
around the world• Total survey error framework (Groves, 1989)– Focus on one part of the survey quality framework
3
Survey Quality
• Accessibility– Availability of survey results to those who need them and are
interpretable• Timeliness
– Results are available when needed• Coherence
– Related statistics can be combined• Completeness
– Statistics are available for all needed domains• Accuracy
– Difference between ‘truth’ and the estimate; measured by variance and bias of estimates or mean square error
4
Survey Quality (2)
• Dimensions combined should yield “fitness for use”– Assure quality through processes, as each dimension
may be difficult to measure directly
• Accuracy is one dimension of quality– But it is the “cornerstone”– With inaccurate data, many would argue (e.g., Biemer & Lyberg,
2003, p. 24) that the other quality dimensions don’t matter
5
Construct
Measurement
Response
Edited Response
Target Population
Sampling Frame
Sample
Respondents
Postsurvey Adjustments
Survey Statistic
Measurement Representation
Groves, et al. 2004, Survey Methodology Figure 2.5
Validity
Measurement Error
Processing ErrorAdjustment
Error
Nonresponse Error
Sampling Error
Coverage Error
6
TSE in notation
targetY ˆframeY sampleY respondentsY
7
Coverage Error
• Gap between the – Target population - who/what you want to make inference to,
including definitions of time and space - and – The sampling frame – list or set of methods and procedures
used to construct a sample; want to be as complete as possible• Example:
– Target population = All possible anglers at all possible sites for all possible species during the week containing June 1, 2012 in the state of Maryland
– Sampling frame = List of marinas, docks and shore fishing sites; method to generate phone numbers for households; list of names and phone numbers of known anglers
8
Coverage Error – Volunteer Surveys
• Who is the target population?– Often the same as for probability-based surveys
• What is the sampling frame? – May be difficult to define– If website and email, then can conceptualize loosely as
persons who (1) have internet access, (2) log on to website or open email, (3) visit the part of the website that contains information about the volunteer angler program
– If in-store flyers, then can conceptualize loosely as persons who (1) visit the store and (2) see the flyer
9
Coverage Error – Why does it matter?
• Potential source for bias in survey statistics
not coveredcovered target covered not covered
target
NY Y Y Y
N
What you want
Coverage rate =
Proportion of target
population missing from
frame
Difference between those who are on frame and those
who are not on frame on statistic of interest
10
TSE in notation
targetY ˆframeY sampleY respondentsY
11
Sampling Error
• Gap between– The sampling frame - list or set of methods and
procedures used to construct a sample - and – The sample – the set of units (persons, households,
businesses, etc.) that are contacted for data collection
• Example: – Frame: List of known anglers– Sample: Subgroup of list of known anglers, selected
with known probability
12
Principles of Survey Samples• Realism– Sample reflects an actual population with real
population parameters• Randomization– Chance mechanisms are used to select units, not
personal judgment • Representation– Mirror or miniature of the population
13
Two approaches to survey sampling
• Chance based approach – Probability sampling – Dominates current survey practice
• Purposive selection – Non-probability sampling– Purely purposive selection has very limited use for
making statements about a population from the sample (inference).
14
Two approaches to survey sampling (2)
• Probability samples– All units on the frame have a known probability of selection.– The method for selecting units from the frame involves
randomness or chance.– Any unit’s chance of selection is determined randomly using
mechanical rules– Examples: Simple random samples, cluster samples, stratified
random samples, probability proportionate to size samples
• Non-probability samples– Units on the frame have unknown probabilities of selection.– The method for selecting units from the frame involves judgment.– Any unit’s chance of selection is determined by a personal
(researcher or participant) decision.– Examples: Snowball samples, Quota samples, Convenience
samples, Volunteer samples
15
Sampling Error – Volunteer Surveys
• What is the sampling frame?– May be difficult to define
• What is the sampling mechanism?– Out of the control of the researchers /
management organization– Probability of being selected into the sample is
unknown– Unclear what the link is between the sample and
the frame
16
Sampling Error – Why does it matter?
• With probability samples, there are no biasing (systematic) errors– That is, the sample estimates won’t be consistently too high or
too low due to sampling error, although that does not rule out other error sources
• The variable errors, known as ‘standard errors,’ have known and well-defined formulas and properties to link the sample back to the frame – They can be used define a range of plausible values in which
the ‘true value’ is likely to fall, known as a ‘confidence interval’
17
Sampling Error – Why does it matter? (2)
• There is no uniformly accepted scientific method for linking a non-probability sample back to the sample frame
• Many approaches have been tried, all using statistical models to try to make the non-probability method ‘look like’ the full population
• Can make the non-probability sample align with the frame on certain characteristics that are used in the model, but no guarantee for other characteristics– Yeager, et al. (2011, POQ) compared adjusted estimates from 7
non-probability samples and 2 probability samples to a variety of benchmark criteria. The adjusted non-probability samples always had substantially higher error rates than the probability samples.
18
TSE in notation
targetY ˆframeY sampleY respondentsY
19
Nonresponse Error
• Gap between the – Sample – the people, households, businesses, or
other units selected for data collection – and the– Respondents – the people, households, businesses
or other units who actually participated in the data collection
• Example:– Sample: Selected anglers randomly selected from
a list of known anglers– Respondents: Anglers who actually completed the
logbooks and other questions asked
20
Nonresponse Error – Volunteer Surveys
• Who is the sample? Who are the respondents? – Difficult to define these two groups separately, as
the mechanism for selecting persons to participate is their own self-selection into the data collection effort
21
Nonresponse Error – Why does it matter?
• Potential source for bias in survey statistics
nonrespondentsrespondents frame respondents nonrespondents
frame
NY Y Y Y
N
What you want
Nonresponse rate =
Proportion of frame
population missing from respondents
Difference between those who responded and those who did not respondent
on statistic of interest
22
Nonresponse Error – Why does it matter? (2)
• Potential source for bias in survey statistics
respondents
( , )( )
Cov p YBias Y
p
Nonresponse bias of the respondent
mean
Covariance between
probability of participating and
the survey variable of
interest
Average probability of participating (similar to
the response rate)
23
Volunteer Surveys from a Survey Quality Framework
• Accessibility– Easily accomplished for volunteer surveys
• Timeliness– If collected by agency who needs the information, results can be
accessed at any time. Question is whether the information is ‘complete’
• Coherence – May be difficult to compare volunteer data with official statistics
• Completeness– May be limited, depending on characteristics of volunteers
• Accuracy– Unknown, difficult to assess without external benchmarks– No assurance that the sample is linked to the population through a
probability mechanism
top related