response rates impact data quality, but not how you might think

Post on 22-Jan-2018

135 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

www.rti.orgRTI International is a registered trademark and a trade name of Research Triangle Institute.

Response Rates Impact Data Quality, but Not How you Might Think

Based on 2 papers:

Eckman, S and Koch, A. “The Relationship between Response Rates,

Sampling Method and Data Quality: Evidence from the European Social

Survey” Under Review

Eckman, S, Himelein, K and Dever, J. “Innovative Sample Designs Using

GIS Technology" forthcoming in Advances in Comparative Survey

Methods: Multicultural, Multinational and Multiregional Context.

Stephanie Eckman, RTI Fellow

Motivation

� Relationship between RR & Data Quality

� High response rates signal data are good quality

� Response rates uncorrelated with data quality

– High RR survey no more accurate than low (Keeter et al, 2000)

– Merkle & Edelman (2002)

– Groves & Peytcheva (2008)

2

RR NR bias (Merkle Edelman)

3 Merkle & Edelman 2002

RRs do not Correlate with Nonresponse Bias

4 Groves & Peytcheva 2008

Motivation

� Relationship between RR & Data Quality

� High response rates signal data are good quality

� Response rates uncorrelated with data quality

– High RR survey not more accurate than low (Keeter et al, 2000)

– Merkle & Edelman (2002)

– Groves & Peytcheva (2008)

� But maybe high response rates are a sign that data are crap?

5

Data Quality

� Total Survey Error Framework

– Undercoverage

– Nonresponse

– Measurement error

– Editing error

– Processing error

– etc.

� Misrepresentation error

– Undercoverage + Nonresponse

� Tradeoff between undercoverage & NR

– Eckman & Kreuter 2017

6

Image: http://makeagif.com/dkjuuc

European Social Survey

� 7 waves

� 30+ countries

� Central Committee sets standards

– Core questionnaire

– Minimum effective sample size

– Paradata collection

– Documentation

– Face to face attempts

– RR standard 70%

� Our data: 136 country-rounds in first 6 waves

7

Sampling Methods in Analysis

8

SamplingMethod Includes

Field Staff Involvement in Selecting

nHousehold Person

Individual Register

None None70

HouseholdRegister

Household Register

Address Register

None

Interviewer

None

Interviewer

41

HouseholdWalk

Listing

Random Walk

Lister

Lister

Interviewer

Interviewer

25

RRs by Sample Type

9

2 Measures of Data Quality

� External measure:

– How different is ESS from Labor Force Survey?

– On 6 categorical variables: age, gender, HH size, marital status, etc.

– Index of dissimilarity measures how different 2 surveys are

– Average over 6 variables

– Assumes LFS is higher quality

� Internal measure:

– 50% of all respondents from gender heterogeneous couples should be

women

– ��,� > 1.96 indicates significant deviation from 50%

10

��,� =%female�,�−50

50 ∗ 50/�

��,�,� = 0.5 ∗�|��,�,���� − ��,�,� !� |�

2 DVs, 2 IVs

� Dependent variables: misrepresentation error

– External measure

– Internal measure

� Independent variables

– RR

– Sampling method

� Joint effect of RR and sampling method on data quality

11

2 Measures vs RR, by Sample Type

12

Regression Models

13Estimated Regression Coefficients

Implications

� High RRs might signal that you have problems with your data

– When interviewers select samples

– Interviewers seem to manipulate selection process to keep RRs high

� Note that ESS does better random walk than other surveys

– Listing should be done by someone other than interviewer

� Other problems with random walk

– Walker effects

– No probabilities of selection

14

Possible Solutions

� What are some alternatives to random walk?

– Satellite Photos

– Reverse Geocoding

– Qibla Method

– Geosampling

– Listing with Drones

15

GIS Resources

� Turn by turn directions on phone

� Satellite images

– Daytime images

– Small-sat revolution

– Nighttime lights

� Other remote sensing data

� How can we exploit these resources for sampling?

– And avoid random walks problems

16

Satellite Photo Mogadishu

17

Reverse Geocoding

� Geocoding: address→ coordinate

� Reverse geocoding: coordinate → address

– Select random points in segment

– Identify closest address

– Many online tools

– Used in Italy ISSP 2009, 2011

18

Example of Reverse Geocoding

19

Example of Reverse Geocoding

20

Example of Reverse Geocoding

21

Qibla Method

� Qibla is Arabic for “in the direction of Mecca”

� Given random starting coordinate

– Interviewer walks in the direction of Mecca

– Selects first HH encountered

22

Example of Qibla Method

23

Example of Qibla Method

24

Geosampling

� Select first stage units

– Administrative units

– Or 1km squares

� Select second stage units

– Smaller squares

� Visit and interview all households in smaller unit

25

Geosampling: First & Second Stage

26

� Eliminates separate listing step

� Still vulnerable to interviewer manipulation

� Possible QC by interviewer GPS tracks? (Himelein et al, 2014)

Geosampling: Second & Third Stage

27

Use of UAVs for Listing

� RTI has tested listing from drone images

– Galapagos & Guatemala

28

Amer et al 2016

Listing with Drones

29

Amer et al 2016

Listing with Drones

30

Amer et al 2016

Listing with Drones

� Still testing use of drones

� Legal issues

� Use local staff to code from images

31

Conclusions

� Ideal method:

– Removes influence of interviewer

– Results in equal probability sample of HUs

– With known probabilities

� No alternative is perfect

– High involvement of interviewers

– High data requirements

� Drones may prove useful

32

33

Stephanie Eckman, PhD

Fellow, Survey Research Division

RTI International

seckman@rti.org

stepheckman.com

top related