internet-use propensity for matching probability and non ...non-probability seductively cheaper...
TRANSCRIPT
![Page 1: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/1.jpg)
Internet-use Propensity for
Matching Probability and
Non-Probability Samples:
the “Fac-sample”
Charles DiSograAndrew Burkey
AAPOR – Austin TXMay 14, 2016
![Page 2: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/2.jpg)
Abt SRBI | pg 2
You already have non-probability Web panel cases
You used a non-probability source because
“rare” target population
efficient reaching a large number of the target population
rapid data collection needed
cost-effective
You had no better alternative to study the target population
A Very Specific Situation
![Page 3: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/3.jpg)
Abt SRBI | pg 3
You need to compute a confidence interval for your data
But these are non-probability cases!
Dilemma
![Page 4: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/4.jpg)
Abt SRBI | pg 4
Make a facsimile of a probability sample as follows:
Treat your large number of non-probability cases as a source pool
Match cases from the pool to existing probability sample cases
Use a propensity score as the matching metric
Propensity to be a non-daily Internet user
Possible solution
![Page 5: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/5.jpg)
Abt SRBI | pg 5
Identify a probability sample from a
population which includes your target group
a domain within the larger sample
Examples: pregnant women, teachers, a
specific health condition, healthcare
personnel, LGBT
Identify eligible cases in the probability
sample that meet your survey criteria
These cases become your referent sample
Find a probability sample! This is key
General population probability sample
![Page 6: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/6.jpg)
Abt SRBI | pg 6
Examples of common variables are
age
gender
education
home ownership
children in household
income
…. etc.
Note: common variables are a constraining factor!
Identify common variables in your “sample” and the probability sample
Non-probability cases
Referent probability sample
![Page 7: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/7.jpg)
Abt SRBI | pg 7
The non-probability source is an opt-in Web panel
ALL cases have Internet access
Assumed to be Daily Internet users
Coverage error = Non-daily users and users not on panels
The probability source may be a general population sample
Cases consist of “Daily” and “Non-daily” Internet users
Step-wise regression tells us which of our common variables are
significant for predicting Non-daily Internet users
Designate a propensity variable to model – Non-daily Inernet User
![Page 8: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/8.jpg)
Abt SRBI | pg 8
Using the combined referent and non-probability cases
Compute the probability of a Non-daily Internet user (p )
Log (p/(1-p)) = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + …. + 𝛽𝑘 𝑋𝑘
Round the propensity scores to achieve a robust match rate
(>80%) of the referent cases to best approximate the probability
cases
Compute a propensity score
![Page 9: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/9.jpg)
Abt SRBI | pg 9
Make exact matches based on
the rounded propensity score
Use only the non-probability
cases that match
Note that you may find multiple
matches
Find your matched cases
Referent cases
Non-probability casesMatched
cases
![Page 10: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/10.jpg)
Abt SRBI | pg 10
Resolve multiple matches
A weight share* adjustment (𝜔𝑖) for each matched non-prob. case
(number of 𝑦𝑖 referent cases in a match)
(number of 𝑥𝑖 non-prob. cases in same match)
Example: 1 referent matches to 2 non-prob. = 1/2 = 0.50 = 𝜔𝑖
σ 𝜔𝑖𝑥𝑖 = σ 𝑦𝑖
Sum weighted matched cases = number of referent cases
𝜔𝑖 =
* Deville JC, Lavallée, P. Indirect Sampling: The Foundations of the Generalized Weight Share Method. Survey
Methodology, 32:2 pp165-176, 2006.
![Page 11: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/11.jpg)
Abt SRBI | pg 11
The Fac-sample is theoretically
“one of any number of possible samples
that can be drawn from the population of
interest”
Fac-sample is next weighted to target
population benchmarks
An approximated SE of the estimate and
confidence interval for the population value
can now be calculated!
Our “Fac-sample” is made!
Matched referent
cases
Matched, adjusted
cases
𝜔𝑖
𝜔𝑖
𝜔𝑖
𝜔𝑖
![Page 12: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/12.jpg)
Abt SRBI | pg 12
Some limitations
Must have identified a suitable probability referent sample
Results hinge on Internet usage propensity
Propensity score matching is restricted to available variables
Not all referent cases are matched
Possible mode effects between probability referent sample and the
non-probability sample
![Page 13: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/13.jpg)
Abt SRBI | pg 13
Conclusions
Non-probability cases can be made a facsimile of a probability
sample using a propensity matching procedure
A confidence interval around the “Fac-sample” can be calculated
Inherent bias likely still exists in the non-probability sample
More work needs to be done with this ex post facto approximation
![Page 14: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/14.jpg)
Abt SRBI | pg 14
Internet-use Propensity for Matching Probability and
Non-Probability Samples: the “Fac-sample”
Charles DiSogra
![Page 15: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/15.jpg)
Apples to Oranges or
Gala vs. Golden Delicious?
Comparing Data Quality of
Nonprobability Internet Samples to
Low Response Rate Probability
Samples
David Dutwin, Ph.D., SSRS
Trent D. Buskirk, Ph.D., MSG AAPOR, 2016
![Page 16: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/16.jpg)
What’s the Problem, Exactly?
Non-probabilistic data sources can have self-selection bias
over and above what probability panels might have:
As such, considerable variance in the universe of web
panels:
* Regardless of Hispanicity
8%
32%
23% 22%
10%
0%
5%
10%
15%
20%
25%
30%
35%
Panel 1 Panel 2 Panel 3 Panel 4 Panel 5
Percent of Web Panelists 18-24 from 5 Leading Convenience Panels
64%69%
80%69%
62%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Panel 1 Panel 2 Panel 3 Panel 4 Panel 5
Percent of Web Panelists White* from 5 Leading Convenience Panels
![Page 17: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/17.jpg)
Who are the Web Panelists Anyway?
67% 64%
0%
20%
40%
60%
80%
Panelist USA
White Non-Hispanic
13% 13%
0%
10%
20%
30%
Panelist USA
Age 18-29
60%52%
0%
20%
40%
60%
80%
Panelist USA
Female
33%30%
0%
10%
20%
30%
40%
Panelist USA
Democrat
33% 34%
0%
10%
20%
30%
40%
Panelist USA
2 Person HH
50% 53%
0%
20%
40%
60%
80%
Panelist USA
Married
Well, they look a lot like Americans…
![Page 18: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/18.jpg)
Who are the Web Panelists Anyway?
Except that they don’t…
10%
19%
0%
10%
20%
30%
Panelist USA
Retired
60%50%
0%
20%
40%
60%
80%
Panelist USA
H.C Thru a Health Plan
78%
40%
0%
20%
40%
60%
80%
Panelist USA
On Internet Multiple Times/Day
87%
64%
0%
20%
40%
60%
80%
100%
Panelist USA
Own a Smartphone
24%
36%
0%
20%
40%
Panelist USA
Age 65+
37%29%
0%
10%
20%
30%
40%
50%
Panelist USA
College Degree
![Page 19: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/19.jpg)
Non-Probability seductively
cheaper
Non-probability vary in
execution, recruitment and
quality Pew Research Report, 2016
Methods based on modeling,
weighting and matching
continue to emerge to improve
quality of estimates from non-
prob samples. Terhanian et al., 2016
Dever et al., 2015
DiSogra et al., 2015
Rivers and Bailey, 2009
Dutwin and Buskirk, 2016
Probability Samples cost more
When compared head to head,
estimates from probability
samples tend to be more
accurate than those from non-
prob samples Callegaro et al., 2014;
Yeager et al., 2011
Krosnick and Chang, 2009
Walker et al., 2009
Response rates continue to
decline
Cost vs. Quality?
![Page 20: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/20.jpg)
Our main Research Questions
How does the quality of non-probability samples compare to that of low-
response rate probability samples?
Can we improve the quality of estimates from non-probability samples using
alternative adjustment methods like calibration, propensity weighting or
sample matching?
![Page 21: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/21.jpg)
Data Sources
Dual-Frame RDD Telephone Sample 1 n = 107,347; ~9% Response Rate.
Non-probability Web Panel 1 n = 82,478; Response Rate unknown.
Dual-Frame RDD Telephone Sample 2 n = 29,153; ~12% Response Rate.
Non-probability Web Panel 2 n = 61,782; Response Rate unknown.
All samples selected and fielded between
October 2012 thru 2014
1st
Sample
Set
2nd
Sample
Set
![Page 22: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/22.jpg)
Methods for Adjusting Non-Probability samples
Base Weight = 1
other than Dual-
Frame which gets
single frame
estimator (Best
and Buskirk,
2012).
Raking to
Education, Region,
Gender, Age,
Race/Ethnicity
No Trimming
Raking/
Calibration
Conducted only in NP-
Panel #2 which
included webographics
Logistic Regression
Model: Education,
Region, Gender, Age,
Race/Ethnicity, Metro
Status, # of Adults,
and Webographics
Tested both raw
scores and a weighting
class (5) variant
Propensity
Adjustments
Sample
MatchingBased on pairing
non-probability
cases with
members of a
probability sample
Sample matches
based on similarity
across core set of
common variables
(Rivers and Bailey,
2009)
Applied to both NP
panel samples
![Page 23: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/23.jpg)
Creating Matched Samples
The matching algorithm uses a
collection of categorical variables from
each case in the Prob. sample and
computes a similarity index defined as
the simple matching coefficient (SMC)
to each case in the NP sample.
The matched case is randomly
selected from among those identified
as the most similar.
See: http://bit.ly/1Fb2Jhs for specific
details of computing the SMC for
categorical variables.
212
![Page 24: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/24.jpg)
Generating the Matched Sample
An 3.5% SRS of the Adults contained in the 2013/2014 CPS Public
Release Data file were matched to sampled units in the combined
NP Panel 1/Panel 2, respectively sample based on 8 common
demographic variables including:
Region (North, South, East, West)
Male
Age Group (18-29, 30-49, 50-64 and 65+)
Race Group (White (NH), Black (NH), Hispanic, Other (NH))
Education Level (<HS, HS, Some College, BS, BS+)
Own/Rent (Own, Rent/Board)
Marital Status (Married, Single, Partnered, Divorced/ Widow/ Separated)
Employment Status (Currently Employed or not)
![Page 25: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/25.jpg)
Primary Metrics
The mean absolute
bias (MAB) –
computed as the
arithmetic mean of
absolute value of the
difference between
the table estimate and
the corresponding
benchmark estimate the mean is taken
over the total number
of estimates within
the variable set
Absolute Bias
The standard
deviation of the
absolute biases
computed from
each variable set
was also
computed. Provides a
sense of the
variability in the
level of biases.
The overall
average MAB –
computed as
the mean of the
12 MAB
statistics
computed
across the 12
variable sets for
each sample
St. Deviation
of Biases
Overall
Average MAB
![Page 26: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/26.jpg)
Key Outcomes We Consider
Common set of survey variables across the sources of data
include household and person-level demographics
External benchmarks for media related information contained
in the main survey source are not commonly available
Given this scenario, we will focus our evaluation and
computation of bias metrics on distributions of one
demographic variable within levels of a second
What is the distribution of Education within each level of Race
What is the distribution of Race within each level of Education
The reference/benchmark values are computed using the 1
Year PUMS Data from the 2012 American Community Survey.
![Page 27: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/27.jpg)
Evaluated Outcome Variable SetsSpecific Demographic Variable Sets of interest include:
Education (5 levels) within Race (4 levels) and and Race within Education
Education (5 levels) within Age-group (4 levels)
and Age-group within Education
Education (5 levels) within Region (4 levels) and Region within Education
Age-group (4 levels) within Race (4 levels) and Race within Age-group
Age-group (4 levels) within Region (4 levels) and Region within Age-group
Race (4 levels) within Region (4 levels) and Region within Race
![Page 28: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/28.jpg)
Computing the Primary Metrics
Race Midwest South West Northeast
White
Black
Other
Hispanic
4 absolute bias measures
4 absolute bias measures
4 absolute bias measures
4 absolute bias measures
The average of these 16 bias
measures represents the
Mean Absolute Bias (MAB) of
Region within Race.
Row Percentages
Consider the demographic cross tabulation of Race and Region
producing a 4-by-4 table. Taking the absolute value of the
difference between the row percentages and the corresponding
benchmarks from CPS produces a total of 16 absolute bias
measures. (Distribution of Region within Race)
Repeating the calculations for each of
the column percentages (Distribution
of Race within Region) yields the MAB
for Race within Region.
![Page 29: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/29.jpg)
MAB Statistics for Demographic Cross-Tabulations
1st
Sample
Set
![Page 30: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/30.jpg)
Unweighted Overall Average Biases
1st
Sample
Set
![Page 31: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/31.jpg)
MAB Statistics for Demographic Cross-Tabulations
2nd
Sample
Set
![Page 32: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/32.jpg)
Unweighted Overall Average Biases
2nd
Sample
Set
![Page 33: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/33.jpg)
Unequal Weighting Effects
Unequal Weighting Effects Telephone 2: 1.21
NP Panel 2 Rake: 2.83
NP Panel 2 Propensity: 6.35
NP Panel 2 Propensity and Rake: 5.43
Unequal Weighting EffectsTelephone 1: 1.36
ABS: 1.74
NP Panel 1: 2.71
NP Panel Matched and Raked: 1.18
1st
Sample
Set
2nd
Sample
Set
![Page 34: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/34.jpg)
Variability in Absolute Bias Measures
1st
Sample
Set
![Page 35: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/35.jpg)
Variability in Absolute Bias Measures
2nd
Sample
Set
![Page 36: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/36.jpg)
Discussion
Methods for improving the quality of nonprobability panels continue to
be made including extensive use of modelling/optimization methods for
selecting candidate variables for weighting (Terhanian et al., 2016)
While we saw that matched samples (based on a simple matching
coefficient) tended to move the absolute bias measures downward,
compared to unweighted and unmatched nonprobability samples, they
still produced estimates with inherently more bias with more variability
than probability samples.
More work is needed to better understand how to optimize the matching
process including: How to incorporate a mixture of categorical and continuous variables
Optimal combination of matching and raking variables
Incorporation of sampling weights into the matched process
Optimal relative size of non-probability and probability samples used for
matching.
![Page 37: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/37.jpg)
What does a 9% response rate get you that a
web panel cannot?
Unweighted, substantially less bias
Weighted, significantly, but not substantially, less bias
A much lower design effect from weighting
Generally less variability in the absolute biases
That said, matched samples attain the lowest design
effect of any weighting
And that said, matched samples “close” to weighted
telephone in terms of lower bias and lower variability in
the absolute biases….but still substantially inferior
And…there is still the issue of cost.
![Page 38: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/38.jpg)
Thank you!
References Available Upon Request
Please contact us with questions
[email protected] | 314-695-1378 | @trentbuskirk
[email protected] | 484-840-4406 | @ddutwin
![Page 39: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/39.jpg)
39icfi.com |
Calculating Standard Errors for
Non-probability Samples when Matching to
Probability Samples
Adam Lee
Randal ZuWallack
ICF InternationalAAPOR
Austin, TX
May 15, 2016
![Page 40: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/40.jpg)
40icfi.com |
Motivation
• Matching probability samples to nonprobability samples is emerging as a popular
methodology
– Reduce costs
– Rare populations
– Split surveys
– Quick turnaround
– Web data collection
• Expand depth
General Health
Pr(n)
In depth
Alcohol
In depth
Diet
In depth
Mental health
In depth
Tobacco
Match on general
behaviors
Pr(n) inherits NPS data from their match
![Page 41: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/41.jpg)
41icfi.com |
AAPOR 2015: NPS matching
1. Selected or Self-Selected? Part 1: A Comparison of Methods for Reducing the Impact
of Self-Selection Biases from Non-Probability Surveys
– Dutwin and Buskirk
2. Selected or Self-Selected? Part 2: Ex ploring Non-Probability and Probability
Samples from Response Propensities to Participant Profiles to Outcome Distributions
– Buskirk and Dutwin
3. Matching an Internet Panel Sample of Health Care Personnel to a Probability Sample
– DiSogra, Greby, Srinath, Andrew Burkey, Black, Sokolowski, Yue, Ball, Donahue
4. Matching an Internet Panel Sample of Pregnant Women to a Probability Sample
– Burkey, DiSogra, Greby, Srinath, Black, Sokolowski, Ding, Ball, Donahue
5. Weighting and Sample Matching Effects for an Online Sample
– Brick, Cohen, Cho, Scott Keeter, McGeeney, Mathiowetz,
6. Can Surveys Posted on Government Websites Give Fair Representations of Public
Opinion?
– Kobayashi
7. Combining a Probability Based Telephone Sample with an Opt-in Web Panel
– ZuWallack, Dayton, Freedner-Maguire, Karriker-Jaffe, Greenfield
![Page 42: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/42.jpg)
42icfi.com |
Survey A
X, Y
Survey B
X, Z
XA = XB
Matched data
X, Y, Z
Survey A
X, Y
Survey B
X, Z
XA = XB
Matched data
X, Y, Z
Statistical matching
– In a nonprobability to probability matching application, the focus may be
less on joint distributions and more on weighting.
– The probability sample, which represents the population, provides the
distribution to calibrate the nonprobability sample. Each person selected
in the probability sample is assigned a statistical match from the
nonprobability sample and inherits the nonprobability data from that
match.
![Page 43: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/43.jpg)
43icfi.com |
Present Research
• Matching Pr(n) with NPS
– Is this a probability sample?
– Can we calculate sampling variance?
• How?
𝜎𝑁𝑃𝑆2
𝑛Pr
– Are there other forms of variance that must be included?
![Page 44: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/44.jpg)
44icfi.com |
SIMULATIONPROBABILITY TO PROBABILITY MATCHING
![Page 45: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/45.jpg)
45icfi.com |
Data
• Using National Health Interview Survey (NHIS) data, we drew two SRS
– “Receiver” Sample: Which would serve as our “Probability” sample, would have retain the
demographic variables, but did not have any analysis variables
– “Donor” Sample: Which would serve as our “Non-Probability Panel” sample. Would have both
demographics variables and analysis variables.
• We would use the common demographics variables (Age & Race) and input the analysis
variable on the receiver sample using the donor sample. This would create a “Matched”
dataset.
• We varied the size of the donor sample to change the variance but held the receiver sample
size constant.
![Page 46: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/46.jpg)
46icfi.com |
Methodology
• Iteratively draw to 100 samples at each donor sample size level
(from 800 to 5,800)
– Each Receiver Sample
• Sample Size Fixed (n = 1,000)
• Includes Demographic Variables (Sex & Race) for Matching
• “Missing” Key Variable of Interest (i.e., Alcohol Drinking Status)
– Each Donor Sample
• Sample Size Iteratively Increased (m = 800 - 5,800)
• Includes Demographic Variables (Sex & Race) for Matching
• Includes Key Variable of Interest (i.e., Alcohol Drinking Status)
• Match the Donor sample to the Receiver sample based on
demographic variables (Sex by Race) using a Random Hot Deck
Matching procedure
– Create a “Matched” dataset that has Receiver demographics and Donor
variable of interest.
![Page 47: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/47.jpg)
47icfi.com |
StatMatch Package in R
• “Integration of two data sources referred to the same target population which share a number
of common variables (aka data fusion). Some functions can also be used to impute missing
values in data sets through hot deck imputation methods. Methods to perform statistical
matching when dealing with data from complex sample surveys are available too.”
• Random Hot Deck - Finds a donor record for each record in the recipient data set. The
donor is chosen at random in the subset of available donors.
– Identify the Receiving and Donor Datasets
– Requires the identification of “donation classes” (e.g., Sex by Race).
– Variables must be shared by both datasets
![Page 48: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/48.jpg)
48icfi.com |
Statistical Matching Approach
Receiver1
SexR RaceR
M White
F Asian
NHIS Dataset
Rec1
Rec2
Rec100
…
Don1
Don2
Don100
…
Match1
Match2
Match100
…
Donor1
SexD RaceD ALCSTATD
M White 0
M White 1
F Asian 0
F Asian 0
Matched1
SexR RaceR ALCSTATD
M White 1
F Asian 0
![Page 49: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/49.jpg)
49icfi.com |
Results
• Vsam = Variance of the 100 original sample estimates (n = 1000)
• Vdonor = Variance of the 100 donor samples (m = 800 to 5800)
• Vmatched = Variance of 100 matched samples
![Page 50: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/50.jpg)
50icfi.com |
Variances by Donor Size
0.00000
0.00005
0.00010
0.00015
0.00020
0.00025
0.00030
0.00035
0.00040
0.00045
0.00050
80
0
90
0
10
00
11
00
12
00
13
00
14
00
15
00
16
00
17
00
18
00
19
00
20
00
21
00
22
00
23
00
24
00
25
00
26
00
27
00
28
00
29
00
30
00
31
00
32
00
33
00
34
00
35
00
36
00
37
00
38
00
39
00
40
00
41
00
42
00
43
00
44
00
45
00
46
00
47
00
48
00
49
00
50
00
51
00
52
00
53
00
54
00
55
00
56
00
57
00
58
00
Vmatched Vsam Vdonor
Vsam = 𝜎2
𝑛
Vdonor = 𝜎2
𝑚
![Page 51: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/51.jpg)
51icfi.com |
Relationship between Vmatch and Vsam+Vdon
0.00000
0.00005
0.00010
0.00015
0.00020
0.00025
0.00030
0.00035
0.00040
0.00045
0.00050
80
0
90
0
10
00
11
00
12
00
13
00
14
00
15
00
16
00
17
00
18
00
19
00
20
00
21
00
22
00
23
00
24
00
25
00
26
00
27
00
28
00
29
00
30
00
31
00
32
00
33
00
34
00
35
00
36
00
37
00
38
00
39
00
40
00
41
00
42
00
43
00
44
00
45
00
46
00
47
00
48
00
49
00
50
00
51
00
52
00
53
00
54
00
55
00
56
00
57
00
58
00
Vmatched Vsam+Vdonor
![Page 52: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/52.jpg)
52icfi.com |
SIMULATIONPROBABILITY TO NON-PROBABILITY MATCHING
![Page 53: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/53.jpg)
53icfi.com |
Experiment
• National Alcohol Survey
– Dual-frame RDD, CATI (3874 landline; 2749 cell)
– NAS Web experiment (n=841)
• 100 RDD samples matched to 100 Web samples
– Sample size: 400
– Donor size: 100-800
– All SRS
• StatMatch based on age and gender
• Vmatched = Variance of 100 matched samples
![Page 54: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/54.jpg)
54icfi.com |
Panel/RDD means and variances
RDD Web paneln Mean SD n Mean SD
Percentage of current drinkers 6623 0.60 0.49 841 0.79 0.41
Current drinkers: proportion who drink wine 3973 0.74 0.44 663 0.85 0.36
Current drinkers: proportion who drink beer 3973 0.61 0.49 663 0.70 0.46
Current drinkers: typical number of drinks when drinking on a quiet evening at home (0-8) 3973 1.27 1.29 663 1.71 1.57
Vsam+Vdonor = (1
𝑛Pr+
1
𝑚NPS) 𝜎𝑁𝑃𝑆
2
![Page 55: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/55.jpg)
55icfi.com |
Results
0.0000
0.0005
0.0010
0.0015
0.0020
0.0025
100 200 300 400 500 600 700 800
Donor Size
Proportion current drinkers
Vmatched
Vsam+Vdonor
0.0000
0.0005
0.0010
0.0015
0.0020
0.0025
0.0030
100 200 300 400 500 600
Donor Size
Current Drinker: Proportion beer drinkers
Vmatched
Vsam+Vdonor
0.0000
0.0050
0.0100
0.0150
0.0200
0.0250
0.0300
0.0350
0.0400
100 200 300 400 500 600
Donor Size
Current Drinker: typical number of drinks when drinking on a quiet evening at home (0-8)
Vmatched
Vsam+Vdonor
0.0000
0.0005
0.0010
0.0015
0.0020
0.0025
100 200 300 400 500 600
Donor Size
Current Drinker: Proportion wine drinkers
Vmatched
Vsam+Vdonor
![Page 56: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/56.jpg)
56icfi.com |
SUMMARY
![Page 57: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/57.jpg)
57icfi.com |
Var =𝜎𝑁𝑃𝑆
2
𝑛Pr?
Summary
• Variance must account for matching and the donor pool
– Large donor pool: Var ≈𝜎𝑁𝑃𝑆
2
𝑛Pr
• Other variance increases/decreases
– Matches used more than once
– Good matching model
Var≈ (1
𝑛Pr+
1
𝑚NPS) 𝜎𝑁𝑃𝑆
2
![Page 58: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/58.jpg)
58icfi.com |
Summary
• Still NPS sample
– Probability matches provide weights
• Variability is based on the NPS
– Need a random sample from panel to estimate 𝜎𝑁𝑃𝑆2
![Page 59: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/59.jpg)
59icfi.com |
Thank you
• For more information, please contact:
![Page 60: Internet-use Propensity for Matching Probability and Non ...Non-Probability seductively cheaper Non-probability vary in execution, recruitment and quality Pew Research Report, 2016](https://reader033.vdocument.in/reader033/viewer/2022050216/5f6211cf7cd50531670f8a21/html5/thumbnails/60.jpg)
Abt SRBI | pg 60
Non-Probability Samples at AAPOR 2016
- Selected papers
Charles DiSogra