challenges in visitor-volume estimation: overview
Post on 04-Feb-2016
41 Views
Preview:
DESCRIPTION
TRANSCRIPT
CHALLENGES IN VISITOR-VOLUME
ESTIMATION:OVERVIEW
Presented By:Dr. Michael Kaylen
University of Missouri
OutlineI. Statistical Background
Estimators, Projection Weights, and Properties of Estimators
II. Sources of Error in a Survey Frame vs. Target Populations, Non-Response
Bias
III. Travel-Related MeasuresTravel Parties, Household Trips, Person Trips, Person Nights, Traveler Expenditures
IV. A Note on Sample SizeV. A Note on StratificationVI. Considerations
Statistical BackgroundProperties of Estimators for Population Parameters• Population Parameters are characteristics of
populations Example: Missouri is interested in the population of households in the continental U.S. (excluding those in Missouri) at the start of the first quarter of 2008. A parameter of interest is the total number of trips those households took to Missouri during the first quarter of 2008. If there are N households in the population of interest and yh is the number of trips household h made to Missouri, then the parameter is given by:
N
hh
yY1
Statistical BackgroundProperties of Estimators for Population Parameters• Estimators for unknown parameters are functions
of the elements of a random sample. Example: For the Missouri case, suppose the sampling design is such that the probability of household h’s inclusion in the sample set (S) is given by πh. The “design weight” is wh=1/ πh and an estimator for Y is
If we randomly sample one out of 1,000 households, the inclusion probability is just 1/1,000 and the design weight (“projection weight”) for every household in that stratum would be 1,000.
yw hSh
hY
^
Statistical BackgroundProperties of Estimators for Population Parameters• 3 Properties of Estimators
Bias: The expected (average) difference between an estimator and the parameter.
When we take a sample and calculate the value of an estimator for that sample, we have an estimate. The difference between that estimate and the true value for the parameter is referred to as sampling error. An estimator is unbiased if its average sampling error is zero.
Variance: The expected (average) squared difference between an estimator and the expected (average) value for the estimator.
Mean Squared Error: The expected (average) squared difference between an estimator and the parameter.
Note: MSE = Bias2 + Variance
Target population
SOURCES OF ERROR IN A SURVEY
Frame population
Target population
SOURCES OF ERROR IN A SURVEY
Sample
Frame population
Target population
SOURCES OF ERROR IN A SURVEY
SampleResponse set
Frame population
Target population
SOURCES OF ERROR IN A SURVEY
Sample Nonresponse setResponse set
Frame population
Target population
SOURCES OF ERROR IN A SURVEY
Travel-Related Measures
Household 1
HH Members John, Katy, Steve
Trip 1
Travel Party John, Katy, Steve
Number of Nights 6
Household Expenditures $170
Trip 2
Travel Party John, Steve (joined with Tom)
Number of Nights 4
Household Expenditures $70
Household 2
Tom, Mary
Tom (joined with John, Steve)
4
$80
Travel-Related MeasuresMissouri Perspective
Travel Party Trips (John, Katy, Steve ) & (John, Steve, Tom)
2
Household Trips (John, Katy, Steve) & (John, Steve) & (Tom)
3
Person Trips (# of Travelers)
3 + 3 6
Person Nights 3(6) + 3(4) 30
Traveler Expenditures $170+$70+$80 $320
Survey Results
Household 1
Projection
# Trips to MO
2 4
# Persons from HH
3 + 2 = 5 10
HH Person Nights
(3x6) + (2x4) = 26
52
HH Expenditures
$170 + $70 $480
Household 2
Projection
1 2
1 2
4 8
$80 $160
Households 1&2
2 + 1 = 3
5 + 1 = 6
26 + 4 = 30
$240 + $80 = $320
A Note on Sample SizeRecently found on the Web:
For results based on this sample of 2,679 registered voters, the maximum margin of sampling error is ±2 percentage points.
A Note on Sample Size
Margin of Sampling Error = Radius of Confidence Interval for a Statistic from a Survey, usually referring to a 95% Confidence Interval.
Example: 95% Confidence Interval for Percentage Favoring Obama is 48% + 2%.
A Note on Sample SizeWhy do travel volume estimates need large samples?Answer: Relative Margin of Error matters.
If an estimated proportion is p and the margin of error is ME, the relative margin of error is:RME = ME/p
In the example, RME = .02/.48 = 0.042, so the ME is about 4.2% of p.
A Note on Sample Size
Travel Example: From past studies, we know about 1% of households in continental U.S. (excluding MO) visit MO in any given month. If we want to estimate the percentage for a given month, we need a smaller confidence interval than 1% + 2%! In this case, RME = .02/.01 = 2, so the ME is 200% of the estimated percentage.
A Note on Sample Size
1% Households Visit
ME RME CI: 1M +/-
N
0.20% 20.00% 200,000 9,508
0.15% 15.00% 150,000 16,903
0.10% 10.00% 100,000 38,032
0.05% 5.00% 50,000 152,127
0.04% 4.00% 40,000 237,699
0.03% 3.00% 30,000 422,576
0.02% 2.00% 20,000 950,796
0.01% 1.00% 10,000 3,803,184
3% Households Visit
CI: 3M +/-
N
600,000 3,105
450,000 5,521
300,000 12,421
150,000 49,685
120,000 77,632
90,000 138,013
60,000 310,529
30,000 1,242,117
How big of a RME can be tolerated? How big of a sample do you need to achieve it?
A Note on Stratification
Data providers often use sampling designs based on stratification of demographic variables such as household income, region, education, etc. There are two issues the user might want to consider.
Even though the final weights balance the sample for non-response, increased variance due to non-response may be important.
For example, consider a stratum calling for 10 households to be sampled, each representing 1000 households. If only 5 of the households respond to the survey, we’ll end up with 5 respondents, each having a weight of 2000. The net effect of this smaller sample with larger weights is that we are more likely to get estimates far away from the true value. There is not a bias problem, but there is an increase in variance.
A Note on Stratification
Data providers often use sampling designs based on stratification of demographic variables such as household income, region, education, etc. There are two issues the user might want to consider.
The strata definitions may not coincide with the user’s needs. For example, the proportion of households that visit MO over a given time period is much higher for its neighboring states that for the non-neighboring states. If a sample over- or under-represents the neighboring states, we are likely to under- or over-estimate household visitation to MO.
Considerations
What is the Frame Population versus the Target Population?
Is the sample size adequate? (relative margin of error is key)
What is the response rate?
Considerations Are the design (projection) weights
reflective of the sampling unit? (e.g., beware of sample designs based on households with weights based on people)
Are the survey questions relative to the sampling unit?(e.g., beware of potential double counting from sampling at the household level but asking questions about travel parties)
Considerations
Is the sample balanced for demographic variables that will likely covary with the study variables?(e.g., household income, location, etc.)?
Do you have access to all of the data?(including weights, non-travelers, non-responders)
THANK YOU!
Questions, Comments, Slides?
www.teri.missouri.edu
top related