introduction to web survey usability design and testing
DESCRIPTION
Amy Anderson Riemer and I taught this shourt courTRANSCRIPT
Introduction to Web Survey Usability Design and Testing
DC-AAPOR Workshop
Amy Anderson RiemerJennifer Romano Bergstrom
The views expressed on statistical or methodological issues are those of the presenters and not necessarily those of the U.S. Census Bureau.
2
Schedule9:00 – 9:15 Introduction & Objectives9:15 – 11:45 Web Survey Design: Desktop &
Mobile11:45 – 12:45 Lunch12:45 – 2:30 Assessing Your Survey2:30 – 2:45 Break2:45 – 3:30 Mixed Modes Data Quality3:30 – 4:00 Wrap Up
3
ObjectivesWeb Survey Design: Desktop & Mobile• Paging vs. Scrolling• Navigation • Scrolling lists vs. double-banked
response options• Edits & Input fields• Checkboxes & Radio buttons• Instructions & Help• Graphics• Emphasizing Text & White Space• Authentication• Progress Indicators• Consistency
Assessing Your Survey• Paradata• Usability
Quality of Mixed Modes• Mixed Mode Surveys• Response Rates• Mode Choice
Web Survey Design
The views expressed on statistical or methodological issues are those of the presenters and not necessarily those of the U.S. Census Bureau.
5
Activity #1
1. Today’s date2. How long did it took you to get to BLS today?3. What do you think about the BLS entrance?
6
Why is Design Important?
• No interviewer present to correct/advise• Visual presentation affects responses
– (Couper’s activity)• While the Internet provides many ways to
enhance surveys, design tools may be misused
7
Why is Design Important?
• Respondents extract meaning from how question and response options are displayed
• Design may distract from or interfere with responses
• Design may affect data quality
8
Why is Design Important?
http://www.cc.gatech.edu/gvu/user_surveys/
9
Why is Design Important?
• Many surveys are long (> 30min)• Long surveys have higher nonresponse rates• Length affects quality
Adams & Darwin, 1982; Dillman et al., 1993; Haberlein & Baumgartner, 1978
10
Why is Design Important?
• Respondents are more tech savvy today and use multiple technologies
• It is not just about reducing respondent burden and nonresponse
• We must increase engagement• High-quality design = trust in the designer
Adams & Darwin, 1982; Dillman et al., 1993; Haberlein & Baumgartner, 1978
11http://www.pewinternet.org/Static-Pages/Trend-Data-(Adults)/Device-Ownership.aspx
12http://www.pewinternet.org/Static-Pages/Trend-Data-(Adults)/Device-Ownership.aspx
13
http://www.nielsen.com/content/dam/corporate/us/en/reports-downloads/2012-Reports/Nielsen-Multi-Screen-Media-Report-May-2012.pdf
14
http://www.nielsen.com/content/dam/corporate/us/en/reports-downloads/2012-Reports/Nielsen-Multi-Screen-Media-Report-May-2012.pdf
15Nielsen: The Cross-Platform Report, Quarter 2, 2012-US
UX Design Failure
• Poor planning• “It’s all about me.” (Redish: filing cabinets)• Human cognitive limitations
– Memory & Perception– (fun activity time)
UX Design Failure
• Poor planning• “It’s all about me.” (Redish: filing cabinets)• Human cognitive limitations
– Memory & Perception– (fun activity time)
- Primacy- Recency
- Chunking- Patterns
23
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
24
Paging vs. Scrolling
Paging• Multiple questions per page• Complex skip patterns• Not restricted to one item
per screen• Data from each page saved
– Can be suspended/resumed
• Order of responding can be controlled
• Requires more mouse clicks
Scrolling• All on one static page• No data is saved until
submitted at end– Can lose all data
• Respondent can review/change responses
• Questions can be answered out of order
• Similar look-and-feel as paper
25
Paging vs. Scrolling
• Little advantage (breakoffs, nonresponse, time, straightlining) of one over the other
• Mixed approach may be best• Choice should be driven by content and target
audience– Scrolling for short surveys with few skip patterns;
respondent needs to see previous responses– Paging for long surveys with intricate skip
patterns; questions should be answered in orderCouper, 2001; Gonyea, 2007; Peytchev, 2006; Vehovar, 2000
26
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
27
Navigation
• In a paging survey, after entering a response– Proceed to next page– Return to previous page (sometimes)– Quit or stop– Launch separate page with Help, definitions, etc.
28
Navigation: NP
• Next should be on the left– Reduces the amount of time to move cursor to
primary navigation button– Frequency of use
Couper, 2008; Dillman et al., 2009; Faulkner, 1998; Koyani et al., 2004; Wroblewski, 2008
29
Navigation NP Example
Peytchev & Peytcheva, 2011
30
Navigation: PN
• Previous should be on the left– Web application order– Everyday devices– Logical reading order
31
Navigation PN Example
32
Navigation PN Example
33
Navigation PN Example
34
Navigation PN Example
35
Navigation Usability Study/Experiment
Romano & Chen, 2011
36
Method
• Lab-based usability study• TA read introduction and left letter on desk• Separate rooms• R read letter and logged in to survey• Think Aloud • Eye Tracking• Satisfaction Questionnaire• Debriefing
Romano & Chen, 2011
37
Results: Satisfaction I
* p < 0.0001
Romano & Chen, 2011
38
Results: Satisfaction II
Overall reaction to the survey: terrible – wonderful. p < 0.05.
Information displayed on the screens: inadequate – adequate. p = 0.07.
Arrangement of information on the screens: illogical – logical. p = 0.19.
Forward navigation: impossible – easy. p = 0.13.
Mean N_P PN6
6.5
7
7.5
8
8.5M
ean
Satis
facti
on R
at-
ing
Mean N_P PN6
6.5
7
7.5
8
8.5
Mea
n Sa
tisfa
ction
Ra
ting
Mean N_P PN6.46.66.8
77.27.47.67.8
88.2
Mea
n Sa
tisfa
ction
Rat
-in
g
Mean N_P PN6
6.5
7
7.5
8
8.5
Mea
n Sa
tisfa
ction
Ra
ting
Romano & Chen, 2011
Eye Tracking
39
• Participants looked at Previous and Next in PN conditions• Many participants looked at Previous in the N_P conditions
– Couper et al. (2011): Previous gets used more when it is on the right.
40
N_P vs. PN: Respondent Debriefing
• N_P version– Counterintuitive– Don’t like the “buttons being flipped.”– Next on the left is “really irritating.”– Order is “opposite of what most people would
design.”• PN version
– “Pretty standard, like what you typically see.”– The location is “logical.”
Romano & Chen, 2011
41
Navigation Alternative• Previous below Next
– Buttons can be closer– But what about older adults?– What about on mobile?
Couper et al., 2011; Wroblewski, 2008
42
Navigation Alternative• Previous below Next
– Buttons can be closer– But what about older adults?– What about on mobile?
Couper et al., 2011; Wroblewski, 2008
43
Navigation Alternative: Large primary navigation button; secondary smaller
44
Navigation Alternative: No back/previous option
45
Confusing Navigation
46
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
47
Long List of Response Options
• One column: Scrolling– Visually appear to belong to one group– When there are two columns, 2nd one may not be
seen (Smyth et al., 1997)• Two columns: Double banked
– No scrolling– See all options at once– Appears shorter
48
1 Column vs. 2 Column Study
Romano & Chen, 2011
49
Seconds to First Fixation
* p < 0.01
2 column 1 column0
5
10
15
20
25
first halfsecond half
Romano & Chen, 2011
50
Total Number of Fixations
2 column 1 column0
5
10
15
20
25
30
35
40
first halfsecond half
Romano & Chen, 2011
51
Time to Complete Item
Mean Min Max0
20
40
60
80
100
120
1 col2 col
Seco
nds
Romano & Chen, 2011
52
1 Col. vs. 2 Col.: Debriefing
• 25 had a preference– 6 preferred one column
• They had received the one-column version
– 19 preferred 2 columns• 7 had received the one-column version• Prefer not to scroll• Want to see and compare everything at once• It is easier to “look through,” to scan, to read• Re one column, “How long is this list going to be?”
Romano & Chen, 2011
53
Long Lists
• Consider breaking list into smaller questions• Consider series of yes/no questions• Use logical order or randomize• If using double-banked, do not separate
columns widely
54
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
55
Input Fields Activity
56
Input Fields
• Smaller text boxes = more restricted• Larger text boxes = less restricted
– Encourage longer responses• Visual/Verbal Miscommunication
– Visual may indicate “Write a story”– Verbal may indicate “Write a number”
• What do you want to allow?
57
Types of Open-Ended Responses
• Narrative– E.g., Describe…
• Short verbal responses– E.g., What was your occupation?
• Single word/phrase responses– E.g., Country of residence
• Frequency/Numeric response– E.g., How many times…
• Formatted number/verbal– E.g., Telephone number
58
Open-Ended Responses: Narrative
• Avoid vertical scrolling when possible• Always avoid horizontal scrolling
59
Open-Ended Responses: Narrative
• Avoid vertical scrolling when possible• Always avoid horizontal scrolling
Wells et al., 2012
32.8 characters 38.4 characters
~700 Rs
60
Open-Ended Responses: Numeric
• Is there a better way?
61
Open-Ended Responses: Numeric
• Is there a better way?
62
Open-Ended Responses: Numeric
• Use of templates reduces ill-formed responses– E.g., $_________.00
Couper et al., 2009; Fuchs, 2007
63
Open-Ended Responses: Date
• Not a good use: intended response will always be the same format
• Same for state, zip code, etc. • Note
– “Month” = text– “mm/yyyy” = #s
64
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
65
Check Boxes and Radio Buttons
• Perceived Affordances• Design according to existing conventions and
expectations• What are the conventions?
66
Check Boxes: Select all that apply
67
Check Boxes in drop-down menus
68
Radio Buttons: Select only one
69
Radio Buttons: Select only one
70
Radio Buttons: In grids
71
Radio Buttons on mobile
• Would something else be better?
72
Reducing Options
• What is necessary?
73
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
74
Placement of Instructions
• Place them near the item
• “Don’t make me think”
• Are they necessary?
75
Placement of Instructions
• Place them near the item
• “Don’t make me think”
• Are they necessary?
76
Placement of Instructions
• Place them near the item
• “Don’t make me think”
• Are they necessary?
77
Instructions
• Key info in first 2 sentences• People skim
– Rule of 2s: Key info in first two paragraphs, sentences, words
78
Instructions
79
Instructions
80
Placement of Clarifying Instructions
• Help respondents have the same interpretation
• Definitions, instructions, examples
Conrad & Schober, 2000; Conrad et al., 2006; Conrad et al., 2007; Martin, 2002; Schober & Conrad, 1997; Tourangeau et al., 2010
81
Placement of Clarifying Instructions
Redline, 2013
82
Placement of Clarifying Instructions
• Percentage of valid responses was higher with clarification
• Longer response time when before item• No effects of changing the font style• Before item is better than after• Asking a series of questions is best
Redline, 2013
83
Placement of Help
• People are less likely to use help when they have to click than when it is near item
• “Don’t make me think”
84
Placement of Error Message
• Should be near the item• Should be positive and helpful, suggesting
HOW to help• Bad error message:
85
Placement of Error Message
• Should be near the item• Should be positive and helpful, suggesting
HOW to help• Bad error message:
86
Error Message Across Devices
87
Error Message Across Devices
88
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
89
Graphics
• Improve motivation, engagement, satisfaction with “fun”
• Decrease nonresponse & measurement error• Improve data quality• Gamification
Henning, 2012; Manfreda et al., 2002
90
Graphics
• Use when they supply meaning– Survey about advertisements
• Use when user experience is improved– For children or video-game players– For low literacy
Libman, 2012
91
Graphics
92
Graphics
http://glittle.org/smiley-slider/
http://www.ollie.net.nz/casestudies/smiley_slider/
93
Graphics Experiment 1.1
• Appearance– Decreasing boldness (bold faded)– Increasing boldness (faded bold)– Adding face symbols to response options ( )
• ~ 2400 respondents• Rated satisfaction re health-related things• 5-pt scale: very satisfied very dissatisfied
Medway & Tourangeau, 2011
94
Graphics Experiment 1.2• Bold side selected more
• Less satisfaction when face symbols present
Medway & Tourangeau, 2011
Very satisfied
Somewhat satisfied Neutral
Somewhat dissatisfied
Very dissatisfied
Your physician O O O O O
Very satisfied
Very dissatisfied
Your physician O O O O O
95
Graphics Experiment 2.1
• Appearance– Radio buttons– Face symbols ( )
• ~ 1000 respondents• Rated satisfaction with a journal• 6-pt scale: very dissatisfied very satisfied
Emde & Fuchs, 2011
96
Graphics Experiment 2.2• Faces were equivalent to radio buttons• Respondents were more attentive when faces
were present– Time to respond
Emde & Fuchs, 2011
97
Slider Usability Study
• Participants thought 1 was selected and did not move the slider. 0 was actually selected if they did not respond.
Strohl, Romano Bergstrom & Krulikowski, 2012
98
Graphics Experiment 3.1
• Modified the visual design of survey items– Increase novelty and interest on select items– Other items were standard
• ~ 100 respondents in experimental condition• ~ 1200 in control• Questions about military perceptions and
media usage• Variety of question types
Gibson, Luchman & Romano Bergstrom, 2013
99
Graphics Experiment 3.2
• No differences
Gibson, Luchman & Romano Bergstrom, 2013
100
Graphics Experiment 3.3• Slight differences:
– Those with enhanced version skipped more often– Those in standard responded more negatively.
Gibson, Luchman & Romano Bergstrom, 2013
101
Graphics Experiment 3.4
Gibson, Luchman & Romano Bergstrom, 2013
• Slight differences: – Those with enhanced version skipped more often
102
Graphics Experiment 3.5
• No major differences
Gibson, Luchman & Romano Bergstrom, 2013
103
Graphics Considerations• Mixed results• “Ad blindness”• Internet speed and
download time• Unintended meaning
104
Graphics Considerations
105
Graphics Considerations
106
Graphics Considerations
107
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
108
Emphasizing Text
• Font– Never underline plain text– Never use red for plain text– Use bold and italics sparingly
109
Emphasizing Text
110
Emphasizing Text
111
Emphasizing Text
• Hypertext– Use meaningful
words and phrases– Be specific– Avoid “more” and
“click here.”
112
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
113
White Space
• White space on a page• Differentiates sections• Don’t overdo it
114
White Space
115
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
116
Authentication
• Ensures respondent is the selected person• Prevents entry by those not selected• Prevents multiple entries by selected
respondent
117
Authentication
• Passive– ID and password embedded in URL
• Active– E-mail entry– ID and password entry
• Avoid ambiguous passwords (Couper et al., 2001)– E.g., contains 1, l, 0, o
• Security concerns can be an issue• Don’t make it more difficult than it needs to be
118
Authentication
119
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
120
Progress Indicators
• Reduce breakoffs• Reduce burden by displaying length of survey• Enhance motivation and visual feedback• Not needed in scrolling design• Little evidence of benefit
Couper et al., 2001; Crawford et al., 2001; Conrad et al., 2003, 2005; Sakshaug & Crawford, 2009
121
Progress Indicators: At the bottom
122
Progress Indicators: At the top
123
Progress Indicators: Mobile
124
Progress Indicators
• They should provide meaning
Strohl, Romano Bergstrom & Krulikowski, 2012
125
Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-
Banked• Edits and Input Fields• Checkboxes and
Radio Buttons• Instructions and Help
• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency
126
Consistency• Predictable
– User can anticipate what the system will do• Dependable
– System fulfills user’s expectations• Habit-forming
– System encourages behavior• Transferable
– Habits in one context can transfer to another• Natural
– Consistent with user’s knowledge
127
Inconsistency
128
Inconsistency
129
Inconsistency
Strohl, Romano Bergstrom & Krulikowski, 2012
Questions and Discussion
Assessing Your Survey
The views expressed on statistical or methodological issues are those of the presenters and not necessarily those of the U.S. Census Bureau.
Assessing Your Survey
Paradata• Background • Uses of Paradata by
mode• Paradata issues
Usability• Usability vs. User Experience• Why, When, What?• Methods
– Focus Groups, In-Depth Interviews
– Ethnographic Observations, Diary Studies
– Usability & Cognitive Testing• Lab, Remote, In-the-Field• Obstacles
Paradata
Types of Data
• Survey Data – collected information from R’s• Metadata – data that describes the survey
– Codebook– Description of the project/survey
• Paradata – data about the process of answering the survey at the R level
• Auxiliary/Administrative Data – not collected directly, but acquired from external sources
Paradata
• Term coined by Mick Couper– Originally described data that were by-products of
computer-assisted interviewing– Expanded to include data from other self-
administered modes • Main uses:
– Adaptive / Responsive design– Nonresponse adjustment– Measurement error identification
Total Survey Error Framework
Groves et al. 2004; Groves & Lyberg 2010
TSE Framework & Paradata
Krueter, 2012
Adaptive / Responsive Design
• Create process indicators • Real-time monitoring (charts & “dashboards”)• Adjust resources during data collection to
achieve higher response rate and/or cost savings• Goal:
– Achieve high response rates in a cost-effective way– Introduce methods to recruit uncooperative – and
possibly different – sample members (reducing nonresponse bias)
Nonresponse Adjustment
• Decreasing response rates have encouraged researchers to look at other sources of information to learn about nonrespondents– Doorstep interactions– Interviewer observations– Contact history data
Contact History Instrument (CHI)
• CHI developed by the U.S. Census Bureau (Bates, 2003)
• Interviewers take time after each attempt (refusal or non-contact) to answer questions in the CHI
• Use CHI information to create models (i.e., heat maps) to identify optimal contact time
• Typically a quick set of questions to answer
• European Social Survey uses a standard contact form (Stoop et al., 2003)
Contact History Inventory (CHI)
U.S. Census Bureau CHI
Paradata
• Background information about Paradata• Uses of Paradata by mode• Paradata issues
Uses of Paradata by Mode
• CAPI• CATI• Web• Mail• Post-hoc
Uses of Paradata - CAPI
• Information collected can include:– Interviewer time spent calling sampled households– Time driving to sample areas– Time conversing with household members– Interview time– GPS coordinates (tablets/mobile devices)
• Information can be used to: – Inform cost-quality decisions (Kreuter, 2009)– Develop cost per contact– Predict the likelihood of response by using interviewer
observations of the response unit (Groves & Couper, 1998)– Monitor interviewers and identify any falsification
Uses of Paradata - CATI
• Information collected can include:– Call transaction history (record of each attempt)– Contact rates– Sequence of contact attempts & contact rates
• Information can be used to: – Optimize call back times– Interviewer monitoring– Inform a responsive design
Uses of Paradata - Web
• Server-side vs. client-side• Information collected can include:
– Device information (i.e., browser type, operating system, screen resolution, detection of JavaScript or Flash)
– Questionnaire navigation information
Callegaro, 2012
Web Paradata - Server-side
• Page requests or “visits” to a web page from the web server
• Identify device information and monitor survey completion
Web Paradata - Server-side cont.
• Typology of response behaviors in web surveys1. Complete responders2. Unit non-responders3. Answering drop-outs4. Lurkers5. Lurking drop-outs6. Item non-responders7. Item non-responding drop-outs
Bosnjak, 2001
Web Paradata – Client-Side
• Collected on the R’s computer• Logs each “meaningful” action• Heerwegh (2003) developed code / guidance for client-
side paradata collected using Java-Script– Clicking on a radio button– Clicking and selecting a response option in a drop-down box– Clicking a check box (checking / unchecking)– Writing text in an input field– Clicking a hyperlink– Submitting the page
Web Paradata – Client-Side cont.
• Stern (2008) used Heerwegh’s paradata techniques to identify:– Whether R’s changed answers; what direction – The order that questions are answered when more
than one are displayed on the screen– Response latencies – the time that elapsed between
when the screen loaded on the R’s computer and they submitted an answer
• Heerwegh (2003) found that the longer the response time, the greater the probability of changing answers and an incorrect response
Browser Information / Operating System Information
• Programmers use this information to ensure they are developing the optimal design
• Desktop, laptop, smartphone, tablet, or other device• Sood (2011) found a correlation between browser
type and survey breakoff & number of missing items– Completion rates for older browsers were lower– Using browser type as a proxy for age of device and
possible connection speed– Older browsers were more likely to display survey
incorrectly; possible explanation for higher drop-out rates
JavaScript & Flash
• Helps to understand what the R can see and do in a survey
• JavaScript adds functionality such as question validations, auto-calculations, interactive help– 2% or less of computer users have JavaScript
disabled (Zakas, 2010)• Flash is used for question types such as drag &
drop or slide-bar questions– Without Flash installed, R’s may not see the question
Flash Question Example
Questionnaire Navigation Paradata
• Mouse clicks/coordinates– Captured with JavaScript– Excessive movements can indicate
• An issue with the question• Potential for lower quality
• Changing answers– Can indicate potential confusion with a question– Paradata can capture answers that were erased– Changes more frequent for opinion question than
factual questions
Stieger & Reips, 2010
Questionnaire Navigation Paradata cont.
• Order of answering– When multiple questions are displayed on a
screen– Can indicate how respondents read the questions
• Movement through the questionnaire (forward and back)– Unusual patterns can indicate confusion and a
possible issue with the questionnaire (i.e., poor question order)
Questionnaire Navigation Paradata cont.
• Number of prompts/error messages/data validation messages
• Quality Index (Haraldsen, 2005)
• Goal is to decrease number of activated errors by improving the visual design and clarity of the questions
Questionnaire Navigation Paradata cont.
• Clicks on non-question links– Help, FAQs, etc.– Indication of when and where Rs use help or other
information built into the survey and displayed as a link• Last question answered before dropping out
– Helps to determine if the data collected can be classified as complete, partial, or breakoff
– Used for response rate computation– Peytchev (2009) analyzed breakoff by question type
• Open ended increased break-off chances by 2.5x; long questions by 3x; slider bars by 5x; introductory screens by 2.6x
Questionnaire Navigation Paradata cont.
• Time per screen / time latency– Attitude strength– Response uncertainty– Response error
• Examples– Heerwegh (2003)
• R’s with weaker attitudes take more time in answering survey questions than R’s with stronger attitudes
– Yan and Tourangeu (2008)• Higher-educated R’s respond faster than lower-educated R’s• Younger R’s respond faster than older R’s.
Uses of Paradata – Call Centers
• Self-administered (mail or electronic) surveys• Call transaction history software
– Incoming calls• Date and time: useful for hiring, staffing, and workflow
decisions• Purpose of the call
– Content issue: useful for identifying problematic questions or support information
– Technical issue: useful for identifying usability issues or system problems
• Call outcome: type of assistance provided
Paradata
• Background information about Paradata• Uses of Paradata by mode• Paradata issues
Paradata Issues
• Reliability of data collected• Costs• Privacy and Ethical Issues
Reliability of data collected
• Interviewers can erroneously record housing unit characteristics, misjudge features about respondents & fail to record a contact attempt
• Web surveys can fail to load properly, and client-side paradata fails to be captured
• Recordings of interviewers can be unusable (e.g., background noise, loose microphones)
Casas-Cardero 2010; Sinibaldi 2010; West 2010
Paradata costs
• Data storage – very large files• Instrument performance• Development within systems• Analysis
Privacy and Ethical Issues
• IP addresses along with e-mail address or other information can be used to identify a respondent
• This information needs to be protected
Paradata Activity
• Should the respondent be informed that the organization is capturing paradata?
• If so, how should that be communicated?
Privacy and Ethical Issues cont.
• Singer & Couper asked members of the Dutch Longitudinal Internet Studies for the Social Sciences (LISS) panel at the end of the survey if they could collect paradata – 38.4% agreed
• Asked before the survey – 63.4% agreed• Evidence that asking permission to use
paradata might make R’s less willing to participate in a survey
Couper & Singer, 2011
Privacy and Ethical Issues cont.
• Reasons for failing to inform R’s about paradata or get their consent– Concept of paradata is unfamiliar and difficult for
R’s to grasp– R’s associate it with the activities of advertisers,
hackers or phishers– Asking for consent gives it more salience– Difficult to convey benefits of paradata for the R
Questions and Discussion
Usability Assessment
Usability Assessment• Usability vs. User Experience• Why, When, What?• Methods
• Focus Groups, In-Depth Interviews• Ethnographic Observations, Diary Studies• Usability and Cognitive Testing
• Lab, Remote, In-the-Field• Obstacles
Background Knowledge
• What does usability mean to you?• Have you been involved in usability research?• How is “user experience” different from
“usability?”
Usability Assessment• Usability vs. User Experience• Why, When, What?• Methods
• Focus Groups, In-Depth Interviews• Ethnographic Observations, Diary Studies• Usability and Cognitive Testing
• Lab, Remote, In-the-Field• Obstacles
Usability vs. User Experience
• Usability: “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use.” ISO 9241-11
• Usability.gov• User experience includes emotions, needs and
perceptions.
Understanding Users
Whitney’s 5 E’s of Usability Peter’s User Experience Honeycomb
The 5 Es to Understanding Users (W. Quesenbery): http://www.wqusability.com/articles/getting-started.htmlUser Experience Design (P. Morville): http://semanticstudios.com/publications/semantics/000029.php
User Experience
Measuring the UX
• How does it work for the end user?• What does the user expect?• How does it make the user feel?• What is the user’s story and habits?• What is the user’s needs?
What people do on the Web
Krug, S. Don’t Make Me Think
Usability Assessment• Usability vs. User Experience• Why, When, What?• Methods
• Focus Groups, In-Depth Interviews• Ethnographic Observations, Diary Studies• Usability and Cognitive Testing
• Lab, Remote, In-the-Field• Obstacles
Why is Testing Important?
• Put it in the hands of the users.• Things may seem straightforward to you but
maybe not to your users.
Why is Testing Important?
• Put it in the hands of the users.• Things may seem straightforward to you but
maybe not to your users.
Why is Testing Important?
Why is Testing Important?
• Put it in the hands of the users.• Things may seem straightforward to you but
maybe not to your users.• You might have overlooked something big!
When to test
Concept
Prototype
Final Product
Test with users
Test with users
Test
Test
Test
Test
Test
What can be tested?
• Existing surveys• Low-fidelity prototypes
– Paper mockups or mockups on computer– Basic idea is there but not functionality or
graphical look• High-fidelity prototypes
– As close as possible to final interface in look and feel
Usability Assessment• Usability vs. User Experience• Why, When, What?• Methods
• Focus Groups, In-Depth Interviews• Ethnographic Observations, Diary Studies• Usability and Cognitive Testing
• Lab, Remote, In-the-Field• Obstacles
Methods to Understand Users
assess interactiona
lmotivations and goals
ensure users can use products
efficiently & w
satisfaction
ensure content is
understood as intended
assess emotions,
perceptions, and
reactions
randomly sample the
population of interest
understand interactions in natural
environment
discuss users’
perceptions and
reactions
Method
Linguistic Analysis
Usability Testing
Cognitive Testing
User Experienc
e Research
Surveys
Ethnographic
Observation
Focus Groups and In-Depth
Interviews
Assessment
Focus Groups
• Structured script
• Moderator discusses the survey with actual or typical users– Actual usage of survey
– Workflow beyond survey
– Expectations and opinions
– Desire for new features and functionality
• Benefit of participants stimulating conversations, but risk of “group think”
In-Depth Interviews
• Structured or unstructured • Talk one-on-one with users,
in person or remotely– Actual usage of the survey– Workflow beyond survey– Expectations and opinions– Desire for new features and
functionality
Usability Assessment• Usability vs. User Experience• Why, When, What?• Methods
• Focus Groups, In-Depth Interviews• Ethnographic Observations, Diary Studies• Usability and Cognitive Testing
• Lab, Remote, In-the-Field• Obstacles
Ethnographic Observations• Observe users in home, office
or any place that is “real-world.”
• Observer is embedded in the user’s culture.
• Allows conversation & activity to evolve naturally, with minimum interference.
• Observe settings and artifacts (other real-world objects).
• Focused on context and meaning making.
Diaries/Journals
• Users are given a journal or a web site to complete on a regular basis (often daily).
• They record how/when they used the survey, what they did, and what their perceptions were.• User-defined data• Feedback/responses develop and change over time• Insight into how technology is used “on-the-go.”
• There is often a daily set of structured questions and/or free-form comments.
Diaries/Journals
• Users are given a journal or a web site to complete on a regular basis (often daily).
• They record how/when they used the survey, what they did, and what their perceptions were.• User-defined data• Feedback/responses develop and change over time• Insight into how technology is used “on-the-go.”
• There is often a daily set of structured questions and/or free-form comments.
Usability Assessment• Usability vs. User Experience• Why, When, What?• Methods
• Focus Groups, In-Depth Interviews• Ethnographic Observations, Diary Studies• Usability and Cognitive Testing
• Lab, Remote, In-the-Field• Obstacles
Usability Testing
• Participants respond to survey items• Assess interface flow and design
• Understanding • Confusion• Expectations
• Ensure skip intricate response patterns work as intended
• Can test final product or early prototypes
Cognitive Testing
• Participants respond to survey items• Assess text
• Confusion• Understanding• Thought process
• Ensure questions are understood as intended and resulting data is valid
• Proper formatting is not necessary.
Usability vs. Cognitive Testing
Usability Testing Metrics• Accuracy
• In completing item/ survey• Number/severity of errors
• Efficiency• Time to complete item/survey• Path to complete item/survey
• Satisfaction • Item-based• Survey-based• Verbalizations
Cognitive Testing Metrics• Accuracy
• Of interpretations
• Verbalizations
Moderating TechniquesTechniques Pros Cons
Concurrent Think Aloud (CTA)
Understand participants’ thoughts as they occur and as they attempt to work through issues they encounter
Elicit real-time feedback and emotional responses
Can interfere with usability metrics, such as accuracy and time on task
Retrospective Think Aloud (RTA)
Does not interfere with usability metrics
Overall session length increases Difficulty in remembering
thoughts from up to an hour before = poor data
Concurrent Probing (CP)
Understand participants’ thoughts as they attempt to work through a task
Interferes with natural thought process and progression that participants would make on their own, if uninterrupted
Retrospective Probing (RP)
Does not interfere with usability metrics
Difficulty in remembering = poor data
Romano Bergstrom, Moderating Usability Tests: http://www.usability.gov/articles/2013/04/moderating-usability-tests.html
Choosing a Moderating Technique
• Can the participant work completely alone?• Will you need time on task and accuracy data?• Are the tasks multi layered and/or require
concentration?• Will you be conducting eye tracking?
Tweaking vs. Redesign
Tweaking• Less work• Small changes occur quickly.• Small changes are likely to
happen.
Redesign• Lots of work after much has
already been invested• May break something else• A lot of people• A lot of meetings
Usability Assessment• Usability vs. User Experience• Why, When, What?• Methods
• Focus Groups, In-Depth Interviews• Ethnographic Observations, Diary Studies• Usability and Cognitive Testing
• Lab, Remote, In-the-Field• Obstacles
Lab vs. Remote vs. In the Field• Controlled
environment
• All participants have the same experience
• Record and communicate from control room
• Observers watch from control room and provide additional probes (via moderator) in real time
• Incorporate physiological measures (e.g., eye tracking, EDA)
• No travel costs
Laboratory Remote In the Field• Participants tend to
be more comfortable in their natural environments
• Recruit hard-to-reach populations (e.g., children, doctors)
• Moderator travels to various locations
• Bring equipment (e.g., eye tracker)
• Natural observations
• Participants in their natural environments (e.g., home, work)
• Use video chat (moderated sessions) or online programs (unmoderated)
• Conduct many sessions quickly
• Recruit participants in many locations (e.g., states, countries)
Lab-Based Usability Testing
Observation area for clientsWe maneuver the
cameras, record, and communicate through microphones and speakers from the control room so we do not interfere
Live streaming close-up screen shot of the participant’s screen
Participant in the testing room
0
Participant in the testing room
Large screens to display material during focus groups
Fors Marsh Group UX Lab
Eye Tracking
• Desktop• Mobile• Paper
Fors Marsh Group UX Lab
Remote Moderated Testing
Participant working on the survey from her home in another state
Moderator working from the office
Observer taking notes, remains unseen from participant
Fors Marsh Group UX Lab
Field Studies
Participant is in her natural environment, completing tasks on a site she normally uses for work
Researcher goes to participant’s workplace to conduct session. She observes and takes notes
Participant uses books from her natural environment to complete tasks on the website
Usability Assessment• Usability vs. User Experience• Why, When, What?• Methods
• Focus Groups, In-Depth Interviews• Ethnographic Observations, Diary Studies• Usability and Cognitive Testing
• Lab, Remote, In-the-Field• Obstacles
Obstacles to Testing
• “There is no time.”– Start early in development process.– One morning a month with 3 users (Krug)– 12 people in 3 days (Anderson Riemer)– 12 people in 2 days (Lebson & Romano Bergstrom)
• “I can’t find representative users.”– Everyone is important.– Travel– Remote testing
• “We don’t have a lab.”– You can test anywhere.
Final Thoughts
• Test across devices.• “User experience is an ecosystem.”
• Test across demographics.• Older adults perform differently than young.
• Start early.
Kornacki, 2013, The Long Tail of UX
Questions & Discussion
Quality of Mixed Modes
The views expressed on statistical or methodological issues are those of the presenters and not necessarily those of the U.S. Census Bureau.
213
Quality of Mixed Modes
• Mixed Mode Surveys• Response Rates• Mode Choice
214
Mixed Mode Surveys
• Definition: Any combination of survey data collection methods/modes
• Mixed vs. Multi vs. Multiple – Modes • Survey organization goal:
– Identify optimal data collection procedure (for the research question)
– Reduce Total Survey Error– Stay within time/budget constraints
215
Mixed Mode Designs
• Sequential – Different modes for different phases of interaction
(initial contact, data collection, follow-up)– Different modes used in sequence during data
collection (i.e., panel survey which begins in one mode and moves to another)
• Concurrent – different modes implemented at the same time
de Lueew & Hox, 2008
216
Why Do Mixed Mode?
• Cost savings• Improve Timeliness• Reduces Total Survey Error
– Coverage error– Nonresponse error– Measurement error
217
Mixed Modes – Cost Savings
• Mixed mode designs give an opportunity to compensate for the weaknesses of each individual mode in a cost effective way (de Leeuw, 2005)
• Dillman 2009 Internet, Mail, and Mixed-Mode Surveys book:– Organizations often start with lower cost mode
and move to more expensive one• In the past: start with paper then do CATI or in person
nonresponse follow-up (NRFU)• Current: start with Internet then paper NRFU
218
Mixed Modes – Cost Savings cont.
• Examples: • U.S. Current Population Survey (CPS) – panel survey
– Initially does in-person interview and collects a telephone number
– Subsequent calls made via CATI to reduce cost• U.S. American Community Survey
– Phase 1: mail– Phase 2: CATI NRFU– Phase 3: face-to-face with a subsample of remaining
nonrespondents
219
Mixed Mode - Timeliness
• Collect responses more quickly• Examples:
– Current Employment Statistics (CES) offers 5 modes (Fax, Web, Touch-tone Data Entry, Electronic Data Interchange, & CATI) to facilitate timely monthly reporting
220
Why Do Mixed Mode?
• Cost savings• Improve Timeliness• Reduces Total Survey Error
– Coverage error– Nonresponse error– Measurement error
Total Survey Error Framework
Groves et al. 2004; Groves & Lyberg 2010
222
Mixed Mode - Coverage Error
• Definition: proportion of the target population that is not covered by the survey frame and the difference in the survey statistic between those covered and not covered
• Telephone penetration• Landlines vs mobile phones
– Web penetration
Groves, 1989
223
Coverage – Telephone
• 88% of U.S. adults have a cell phone• Young adults, those with lower education, and
lower household income more likely to use mobile devices as main source of internet access
Smith, 2011; Zickuhr & Smith, 2012
224
Coverage - Internet
• Coverage is limited– No systematic directory of addresses
• 1 in 5 in U.S. do not use the Internet
Zickuhr & Smith, 2012
225
226
World Internet Statistics
227
Coverage –Web cont.
• Indications that Internet adoption rates have leveled off
• Demographics least likely to have Internet– Older – Less education– Lower household income
• Main reason for not going online: not relevant
Pew, 2012
228
European Union – Characteristics of Internet Users
229
Coverage - Web cont.
• R’s reporting via Internet can be different from those reporting via other modes – Internet vs. mail (Diment & Garret-Jones, 2007;
Zhang, 2000)• R’s cannot be contacted through the Internet
because e-mail addresses lack structure for generating random samples (Dillman, 2009)
230
Mixed Mode – Nonresponse Error
• Definition: inability to obtain complete measurements on the survey sample (Groves, 1998)– Unit nonresponse - entire sampling unit fails to
respond– Item nonresponse – R’s fail to respond to all
questions• Concern is that respondents and non-
respondents may differ on variable of interest
231
Mixed Mode – Nonresponse cont.
• Overall response rates have been declining• Mixed mode is a strategy used to increase
overall response rates while keeping costs low • Some R’s have a mode preference (Miller,
2009)
232
Mixed Mode – Nonresponse cont.
• Some evidence of a reduction in overall response rates when multiple modes offered concurrently in population/household surveys– Examples: Delivery Sequence File Study (Dillman,
2009); Arbitron Radio Diaries (Gentry, 2008), American Communities Survey (Griffen, et al, 2001), Survey of Doctorate Recipients (Grigorian & Hoffer, 2008)
• Could assign R’s to modes based on known preferences
233
Mixed Mode – Measurement Error
• Definition: “observational errors” arising from the interviewer, instrument, mode of communication, or respondent (Groves, 1998)
• Providing mixed modes can help reduce the measurement error associated with collecting sensitive information– Example: Interviewer begins face-to-face interview
(CAPI) then lets R continue on the computer with headphones (ACASI) to answer sensitive questions
234
Mode Comparison Research
• Meta-analysis of articles by – Harder to get mail responses– Overall non-response rates & item non-response
rates are higher in self-administered questionnaires, BUT answered items are of high quality
– Small difference in quality between face-to-face and telephone (CATI) surveys.
– Face-to-face surveys had slightly less item non-response rates
de Leeuw, 1992
235
Mode Comparison Research cont.
• Question order and response order effects less likely in self-administered than telephone – R’s more likely to choose last option heard in CATI
(recency effect)– R’s more likely to choose the first option seen in
self-administered (primacy effect)– Mixed results on item-nonresponse rates in Web
de Leeuw, 1992; 2008
236
Mode Comparison Research cont.
• Some indication that Internet surveys are more like mail than telephone surveys– Visual presentation vs auditory
• Conflicting evidence item non-response (some show higher item non-response on Internet v.s. mail while others show no difference)
• Some evidence of better quality data– Fewer post-data collection edits needed for
electronic v.s. mail responses
Sweet & Ramos, 1995; Griffin et. al, 2001
237
Disadvantages of Mixed Mode
• Mode Effects– Concerns for measurement error due to the mode
• R’s providing different answers to the same questions displayed in different modes
– Different contact/cooperation rates because of different strategies used to contact R’s
238
Disadvantages of Mixed Mode
• Decrease in overall response rates– Why: Effects of offering a mix of mail/web mixed– What: Meta-analysis of 16 studies that compared
mixed mode surveys with mail and web options
Results: empirical evidence that offering mail and Web concurrently resulted in a significant reduction in response rates
Medway & Fulton, 2012
239
• Why this is happening? – Potential Hypothesis #1: R’s dissuaded from responding
because they have to make a choice• Offering multiple modes increases burden (Dhar, 1997)• While judging pros/cons of each mode, neither appear attractive
(Schwartz, 2004)
– Potential Hypothesis #2: R’s choose Web, but never actually do it
• If R’s receive invitation in mail, there is a break in their response process (Griffin, et. al, 2001)
– Potential Hypothesis #3: R’s that choose Web may get frustrated with the instrument and abandon the whole process (Couper, 2000)
Response Rates in Mixed Mode Surveys
240
Overall Goals
• Find the optimal mix given the research questions and population of interest
• Other factors to consider: – Reducing Total Survey Error (TSE)– Budget– Time– Ethics and/or privacy issues
Biemer & Lyberg, 2003
241
Quality of Mixed Modes
• Mixed Mode Surveys• Response Rates• Mode Choice
242
Technique for Increasing Response Rates to Web in Multi-Mode Surveys
• “Pushing” R’s to the web– Sending R’s an invitation to report via Web– No paper questionnaire in the initial mailing– Invitation contains information for obtaining the
alternative version (typically paper)– Paper versions are mailed out during follow-up to
capture responses to those that do not have web access or do not want to respond via web
– “Responding to Mode in Hand” Principal
243
“Pushing” Examples
• Example 1: Lewiston-Clarkson Quality of Life Survey
• Example 2: 2007 Stockholm County Council Public Health Survey
• Example 3: American Community Survey• Example 4: 2011 Economic Census Re-file
Survey
244
Pushing Example 1 – Lewiston-Clarkson Quality of Life Survey
• Goals: increase web response rates in a paper/web mixed-mode survey and identify mode preferences
• Method: – November 2007 – January 2008– Random sample of 1,800 residential addresses– Four treatment groups– To assess mode preference, this question was at the end of
the survey: • “If you could choose how to answer surveys like this, which one
of the following ways of answering would you prefer?”• Answer options: web or mail or telephone
Miller, O’Neill, Dillman, 2009
245
Pushing Example 1 – cont.
• Group A: Mail preference with web option– Materials suggested mail was preferred but web
was acceptable• Group B: Mail Preference
– Web option not mentioned until first follow-up• Group C: Web Preference
– Mail option not mentioned until first follow-up• Group D: Equal Preference
246
Pushing Example 1 – cont.
• Results
247
Pushing Example 1 – cont.
“If you could choose how to answer surveys like this, which one of the following ways of answering would you prefer?”
248
Pushing Example 1 – cont.
249
Pushing Example 1 – cont.
Group C = Web Preference Group
250
Pushing Example 1 – cont.
• Who can be pushed to the Web?
251
Pushing Example 2 – 2007 Stockholm County Council Public Health Survey
• Goal: increase web response rates in a paper/web mixed-mode survey
• Method:– 50,000 (62% response rate)– 4 treatments that varied in “web intensity”– Plus a “standard” option – paper and web login
data
Holmberg, Lorenc, Werner, 2008
252
Pushing Example 2 – Cont.
• Overall response rates
S= Standard A1= very paper “intense”A2= paper “intense” A3= web “intense”A4= very web “intense”
253
Pushing Example 2 – Cont.
• Web responses
S= Standard A1= very paper “intense”A2= paper “intense”A3= web “intense”A4= very web “intense”
254
Pushing Example 3 – American Community Survey
• Goals: – Increase web response rates in a paper/web
mixed-mode survey– Identify ideal timing for non-response follow-up– Evaluate advertisement of web choice
Tancreto et. al., 2012
255
Pushing Example 3 – Cont.
• Method– Push: 3 versus 2 weeks until paper questionnaire– Choice: Prominent and Subtle– Mail only (control)
– Tested among segments of US population
• Targeted• Not Targeted
256
Response Rates by Mode in Targeted Areas
Ctrl (Mail only) Prom Choice Subtle Choice Push (3 weeks) Push (2 weeks)0
5
10
15
20
25
30
35
40
45
38.1
28.434.1
2.5
12.5
9.83.5
28.6 28.0
Internet Mail
37.238.1
31.1
40.5
31.1
37.638.2
257
Response Rates by Mode in Not Targeted Areas
Ctrl (Mail only) Prom Choice Subtle Choice Push (3 weeks) Push (2 weeks)0
5
10
15
20
25
30
35
40
45
50
29.724.1
27.82.7
12.6
6.32.1
17.1 17.2
Internet Mail
37.2
29.7
19.8
30.4 29.9 29.8
258
Example 4: Economic Census Refile
• Goal: to increase Internet response rates in a paper/Internet establishment survey during non-response follow-up
• Method: 29,000 delinquent respondents were split between two NRFU mailings– Letter-only mailing mentioning Internet option– Letter and paper form mailing
Marquette, 2012
259
Example 4: Cont.
260
Quality of Mixed Modes
• Mixed Mode Surveys• Response Rates• Mode Choice
261
Why Respondents Choose Their Mode?
• Concern about “mode paralysis” – When two option are offered, R’s much choose
between tradeoffs– This choice makes each option less appealing – By offering a choice between Web and mail;
possibly discouraging response
Miller and Dillman, 2011
262
Mode Choice
• American Community Survey – Attitudes and Behavior Study
• Goals: – Measure why respondents chose the Internet or
paper mode during the American Community Survey Internet Test
– Why there was nonresponse and if it was linked to the multi-mode offer
Nichols, 2012
263
Mode Choice – cont.
• CATI questionnaire was developed in consultation with survey methodologists
• Areas of interest included:– Salience of the mailing materials and messages– Knowledge of the mode choice– Consideration of reporting by Internet– Mode preference
264
Mode Choice – cont.
• 100 completed interviews per notification strategy (push
265
Mode Choice – cont.
• Results– Choice/Push Internet respondents opted for perceived
benefits – easy, convenient, fast– Push R’s noted that not having the paper form
motivated them to use the Internet to report– Push R’s that reported via mail did so because they did
not have the Internet access or had computer problems– The placement of the message about the Internet
option was reasonable to R’s– R’s often recalled the letter that accompanied the
mailing package mentioning the mode choice
266
Mode Choice – cont.
• Results cont.– Several nonrespondents cited not knowing that a
paper option was available as a reason for not reporting
– Very few nonrespondents attempted to access the online form
– Salience of the mailing package and being busy were main reasons for nonresponse
– ABS study did NOT find “mode paralysis”
Questions and Discussion
Amy Anderson RiemerUS Census Bureau
Jennifer Romano BergstromFors Marsh Group