7-mon-harbor-235-calibrating bias in online samples-v2 audience measurem… · •online survey...

19
#ARFAxS Calibrating Bias in Online Samples for High Quality Surveys at Scale Steven Millman Chief Scientist MRI-Simmons

Upload: others

Post on 24-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

#ARFAxS

Calibrating Bias in Online Samples for High Quality Surveys at Scale

Steven Millman

Chief Scientist

MRI-Simmons

Page 2: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

#ARFAxS

Calibrating for Bias in Online Samples

Steven Millman – Chief Scientist, MRI-Simmons

Contributing Author: Hu Yang – Lead Data Scientist, MRI-Simmons

Page 3: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

• Online survey panelists are not generally drawn randomly from the population of interest

• Online survey panelists are individuals that have made a conscious decision to monetize their use of the Internet

• This research sought to identify the ways in which online panelists are different, and how those differences can be corrected

• A key finding of this research is that non-probability online panelists are different from probability sample in significant ways that cannot be corrected with simple demographic weighting

Online panelists are not representative(in other news, the sun is hot)

Page 4: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

“Two important findings are that nonresponse [or selection] bias is generally unrelated to the overall

response rate and that… bias tends to be item-specific. Some or even many items may exhibit no bias while others have

substantial bias.” *

-John L. Czajka & Amy Beyler

Sample bias tends to be narrow, not broad

* "Declining Response Rates in Federal Surveys: Trends and Implications (Background Paper)," June, 2016. Mathematica Policy Research Reports prepared for the Office of the Assistant Secretary for Planning and Evaluation, U.S. Department of Health & Human Services. “[or selection]” was added by the author of this presentation and not Czajka & Beyler.

Page 5: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

• The National Consumer Study (NCS) is a high-quality random probability sample of 25,000 American adults each year

• Sample is collected by mail from list of residential addresses• The NCS contains over 60,000 data elements, including:

► Demographics► Product purchase & usage► Brand preferences► Lifestyles, attitudes, and opinions► Media usage & preferences► Shopping behavior► Custom variables

The National Consumer Study

Page 6: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

• Approximately 60% of the NCS was converted to an online version of the study

• Online sample (address verified) was stratified by age, gender, & top 14 DMAs

• The online study was fielded to 13,905 online panelists and compared to 20,953 respondents from the 2018 Summer NCS study (online adults only)

• New universe estimates where generated for the online US population, adding time spent online as a weight

• Compared weighted average responses between mail probability and online non-probability samples

• Identified significant and substantive deviations between properly weighted probability and online panel samples

Research Design

Page 7: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

• Comparisons were made using raw data, demo weighted, and weighted against demo and time spent online

• We subsequently developed a naïve process to identify additional weights from among all questions in the digital version of the survey to find the best possible additional calibrators

• Remaining substantive and significant differences were likely to be driven by selection bias. Other potential biases could result from:◦ Modal differences◦ Questionnaire differences◦ Timing of the survey (October-December v. full year)

Identifying the bias in non-probability online panels

Page 8: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

Results of weighting strategies

Variables with significant differences between probability and non-probability samples

Variables(~40,000 total) Unweighted Demo

Weighted Only

Weighted by Demos &

Internet Use

Weighted & Calibrated

Variables with more than five percentage point deviations

1,854(4.52%)

1,769(4.31%)

1,579(3.85%)

1,391(3.39%)

Variables with more than ten percentage point deviations

354(0.09%)

312(0.08%)

215(0.05%)

198(0.04%)

• Most variables were relatively unbiased compared to probability sample• Weighting on demographics had very little impact (see Appendix 1)• Even after weighting for demos, Internet use and naïve calibrators,

substantial biases remained

Page 9: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

• Online shopping behavior

• Communication

• Information seeking

• Video streaming

• Use of technology

The most strongly biased variables fell into a small set of question categories

Page 10: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

Online shopping behavior(%online-%probability, all significant at p<0.00001)

Self-reported behavior of online panelist Point DiffPaypal.com, last 30 days +28.2

Amazon.com, last 30 days +18.2

Find/Print Coupons from Websites, last 30 days +17.6

Made a purchase online +15.4

Online banking, last 30 days +15.4

Often I can be swayed by coupons to try new food products +13.5

Gathered information for shopping online, last 30 days +12.5

Ebay.com, last 30 days +12.5

Because of a coupon, I’d be drawn to a store I normally don’t shop at +11.6

Groupon.com, last 30 days +10.8

Page 11: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

Communication(%online-%probability, all significant at p<0.00001)

Self-reported behavior of online panelist Point DiffEmail, highest use +23.7

Used email on mobile/handheld device, last 30 days +22.3

Visited social networking website, last 30 days +15.7

Visited social networking website on mobile/handheld, last 30 days +15.2

Internet has changed the way I spend my free time +15.1

Visited social networking website, highest use +12.3

Twitter.com, last 30 days +11.6

Facebook, Instagram, and Pinterest, last 30 days ~+5.0

Page 12: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

Information seeking(%online-%probability, all significant at p<0.00001)

Self-reported behavior of online panelist Point DiffCheck the weather on mobile/handheld, last 30 days +18.4

Use Internet in stores +13.8

Use search engines +12.5

Bing +16.0

Yahoo +15.1

Google +11.6

Look for recipes online, highest use +11.6

Page 13: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

Video streaming(%online-%probability, all significant at p<0.00001)

Self-reported behavior of online panelist Point DiffAmazon Prime Instant Video, last 30 days +20.0

Netflix, last 30 days +15.6

Hulu (limited commercials), last 30 days +11.1

Hulu (no commercials), last 30 days +10.1

Download or stream TV programs, last 30 days +12.6

Download or stream Movies, last 30 days +10.3

Watch Video Content online, last 30 days +9.7

My computer is a primary source of fun and entertainment +12.9

The Internet has become a primary source of entertainment for me +10.3

Attended the movies in a theater in the last 6 months? -12.2

Page 14: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

Use of technology(%online-%probability, all significant at p<0.00001)

Self-reported behavior of online panelist Point DiffUse the Internet on a desktop computer at home +20.9

Use the Internet on a laptop computer at home +15.4

Use the Internet on a gaming system +15.1

Use the Internet on a tablet +14.7

Use the Internet on an iPod/MP3 player +10.6

Own or play video games +11.6

Use Chrome most often as Internet browser +14.4

Use Internet Explorer most often as Internet browser -10.6

Have a family plan for cellular phone -17.5

Page 15: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

• I stick with clothing styles that have stood the test of time• Many similarly priced clothing brands look alike• I don't like the idea of being in debt• There is nothing wrong with indulging in eating fattening

foods from time to time• I try to include plenty of fiber in my diet these days• I usually look for the freshest ingredients when I cook• Non-vegetarian

Other psychographics with where online sample is >10 points higher

Page 16: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

• Interest in MLB, NFL, MLS (much less)• Moisturizers/Creams/Lotions (much less often)• Bought an automobile last 12 months (much less)• Thirst Quenchers and Sports/activity drinks (much less)• Sneakers Athletic Shoes (much less)• Use eye shadow (much more)• Often eat frozen dinner (much more)• Hershey's Milk Chocolate (much more)

Selected brand use with over +/-10 point differences compared to probability

Page 17: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

• Non-probability online panelists can be used to provide a reasonably representative snapshot of populations of interest, but are wildly inaccurate for an important subset of topics

• In particular, attitudes and use of the internet and technology are severely biased in ways that cannot be corrected with demographic weights

• In order to use online survey panelists to investigate these topics, the use of a calibration set derived from either from a properly representative random sample of respondents or from census-level data would be required

• Rule of thumb: Try to avoid asking online survey panelists about online activities and beliefs

Conclusion

Page 18: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

• Gender • Age

• Personal Income (Includes not-employed)

• Marital Status

• County Size

• Number of Adults in HH (Non-Hispanic)

• Number of HISPANIC Adults in HH (Hispanics)

• Race• Education

• Homeowner type

• Presence of Children

• Area (DMA or Region)

• Born in US (Hispanic only)

• Heritage by Region (Hispanic only)

• Language Spoken at Home (Hispanic only)• Hours Spent Online, Work+Home (Online sample weights only)

Appendix 1:Sample Weight Variables

Page 19: 7-Mon-Harbor-235-Calibrating Bias in Online Samples-v2 Audience Measurem… · •Online survey panelists are individuals that have made a conscious decision to monetize their use

#ARFAxS

Your Feedback Matters!Please rate this session* to help the ARF with future programming:

1. Click on “evaluation” at the bottom of the screen

2. Answer the two questions (They are multiple choice. It’s easy!)

3. If you haven’t reviewed a session you attended earlier, pleasego back to review it.

*Surveys are available on the event app. Instructions are on the back of the program guide.