introduction to web survey usability design and testing

Introduction to Web Survey Usability Design and Testing

DC-AAPOR Workshop

Amy Anderson RiemerJennifer Romano Bergstrom

The views expressed on statistical or methodological issues are those of the presenters and not necessarily those of the U.S. Census Bureau.

Schedule9:00 – 9:15 Introduction & Objectives9:15 – 11:45 Web Survey Design: Desktop &

Mobile11:45 – 12:45 Lunch12:45 – 2:30 Assessing Your Survey2:30 – 2:45 Break2:45 – 3:30 Mixed Modes Data Quality3:30 – 4:00 Wrap Up

ObjectivesWeb Survey Design: Desktop & Mobile• Paging vs. Scrolling• Navigation • Scrolling lists vs. double-banked

response options• Edits & Input fields• Checkboxes & Radio buttons• Instructions & Help• Graphics• Emphasizing Text & White Space• Authentication• Progress Indicators• Consistency

Assessing Your Survey• Paradata• Usability

Quality of Mixed Modes• Mixed Mode Surveys• Response Rates• Mode Choice

Web Survey Design

Activity #1

1. Today’s date2. How long did it took you to get to BLS today?3. What do you think about the BLS entrance?

Why is Design Important?

• No interviewer present to correct/advise• Visual presentation affects responses

– (Couper’s activity)• While the Internet provides many ways to

enhance surveys, design tools may be misused

• Respondents extract meaning from how question and response options are displayed

• Design may distract from or interfere with responses

• Design may affect data quality

http://www.cc.gatech.edu/gvu/user_surveys/

• Many surveys are long (> 30min)• Long surveys have higher nonresponse rates• Length affects quality

Adams & Darwin, 1982; Dillman et al., 1993; Haberlein & Baumgartner, 1978

• Respondents are more tech savvy today and use multiple technologies

• It is not just about reducing respondent burden and nonresponse

• We must increase engagement• High-quality design = trust in the designer

Adams & Darwin, 1982; Dillman et al., 1993; Haberlein & Baumgartner, 1978

11http://www.pewinternet.org/Static-Pages/Trend-Data-(Adults)/Device-Ownership.aspx

12http://www.pewinternet.org/Static-Pages/Trend-Data-(Adults)/Device-Ownership.aspx

http://www.nielsen.com/content/dam/corporate/us/en/reports-downloads/2012-Reports/Nielsen-Multi-Screen-Media-Report-May-2012.pdf

15Nielsen: The Cross-Platform Report, Quarter 2, 2012-US

UX Design Failure

• Poor planning• “It’s all about me.” (Redish: filing cabinets)• Human cognitive limitations

– Memory & Perception– (fun activity time)

UX Design Failure

• Poor planning• “It’s all about me.” (Redish: filing cabinets)• Human cognitive limitations

– Memory & Perception– (fun activity time)

- Primacy- Recency

- Chunking- Patterns

Web Survey Design• Paging vs. Scrolling• Navigation• Scrolling vs. Double-

Banked• Edits and Input Fields• Checkboxes and

Radio Buttons• Instructions and Help

• Graphics• Emphasizing Text• White Space• Authentication• Progress Indicators• Consistency

Paging vs. Scrolling

Paging• Multiple questions per page• Complex skip patterns• Not restricted to one item

per screen• Data from each page saved

– Can be suspended/resumed

• Order of responding can be controlled

• Requires more mouse clicks

Scrolling• All on one static page• No data is saved until

submitted at end– Can lose all data

• Respondent can review/change responses

• Questions can be answered out of order

• Similar look-and-feel as paper

Paging vs. Scrolling

• Little advantage (breakoffs, nonresponse, time, straightlining) of one over the other

• Mixed approach may be best• Choice should be driven by content and target

audience– Scrolling for short surveys with few skip patterns;

respondent needs to see previous responses– Paging for long surveys with intricate skip

patterns; questions should be answered in orderCouper, 2001; Gonyea, 2007; Peytchev, 2006; Vehovar, 2000

Navigation

• In a paging survey, after entering a response– Proceed to next page– Return to previous page (sometimes)– Quit or stop– Launch separate page with Help, definitions, etc.

Navigation: NP

• Next should be on the left– Reduces the amount of time to move cursor to

primary navigation button– Frequency of use

Couper, 2008; Dillman et al., 2009; Faulkner, 1998; Koyani et al., 2004; Wroblewski, 2008

Navigation NP Example

Peytchev & Peytcheva, 2011

Navigation: PN

• Previous should be on the left– Web application order– Everyday devices– Logical reading order

Navigation PN Example

Navigation Usability Study/Experiment

Romano & Chen, 2011

Method

• Lab-based usability study• TA read introduction and left letter on desk• Separate rooms• R read letter and logged in to survey• Think Aloud • Eye Tracking• Satisfaction Questionnaire• Debriefing

Romano & Chen, 2011

Results: Satisfaction I

* p < 0.0001

Romano & Chen, 2011

Results: Satisfaction II

Overall reaction to the survey: terrible – wonderful. p < 0.05.

Information displayed on the screens: inadequate – adequate. p = 0.07.

Arrangement of information on the screens: illogical – logical. p = 0.19.

Forward navigation: impossible – easy. p = 0.13.

Mean N_P PN6

Mean N_P PN6.46.66.8

77.27.47.67.8

Mean N_P PN6

Romano & Chen, 2011

Eye Tracking

• Participants looked at Previous and Next in PN conditions• Many participants looked at Previous in the N_P conditions

– Couper et al. (2011): Previous gets used more when it is on the right.

N_P vs. PN: Respondent Debriefing

• N_P version– Counterintuitive– Don’t like the “buttons being flipped.”– Next on the left is “really irritating.”– Order is “opposite of what most people would

design.”• PN version

– “Pretty standard, like what you typically see.”– The location is “logical.”

Romano & Chen, 2011

Navigation Alternative• Previous below Next

– Buttons can be closer– But what about older adults?– What about on mobile?

Couper et al., 2011; Wroblewski, 2008

Navigation Alternative• Previous below Next

– Buttons can be closer– But what about older adults?– What about on mobile?

Couper et al., 2011; Wroblewski, 2008

Navigation Alternative: Large primary navigation button; secondary smaller

Navigation Alternative: No back/previous option

Confusing Navigation

Long List of Response Options

• One column: Scrolling– Visually appear to belong to one group– When there are two columns, 2nd one may not be

seen (Smyth et al., 1997)• Two columns: Double banked

– No scrolling– See all options at once– Appears shorter

1 Column vs. 2 Column Study

Romano & Chen, 2011

Seconds to First Fixation

* p < 0.01

2 column 1 column0

first halfsecond half

Romano & Chen, 2011

Total Number of Fixations

2 column 1 column0

first halfsecond half

Romano & Chen, 2011

Time to Complete Item

Mean Min Max0

1 col2 col

Romano & Chen, 2011

1 Col. vs. 2 Col.: Debriefing

• 25 had a preference– 6 preferred one column

• They had received the one-column version

– 19 preferred 2 columns• 7 had received the one-column version• Prefer not to scroll• Want to see and compare everything at once• It is easier to “look through,” to scan, to read• Re one column, “How long is this list going to be?”

Romano & Chen, 2011

Long Lists

• Consider breaking list into smaller questions• Consider series of yes/no questions• Use logical order or randomize• If using double-banked, do not separate

columns widely

Input Fields Activity

Input Fields

• Smaller text boxes = more restricted• Larger text boxes = less restricted

– Encourage longer responses• Visual/Verbal Miscommunication

– Visual may indicate “Write a story”– Verbal may indicate “Write a number”

• What do you want to allow?

Types of Open-Ended Responses

• Narrative– E.g., Describe…

• Short verbal responses– E.g., What was your occupation?

• Single word/phrase responses– E.g., Country of residence

• Frequency/Numeric response– E.g., How many times…

• Formatted number/verbal– E.g., Telephone number

Open-Ended Responses: Narrative

• Avoid vertical scrolling when possible• Always avoid horizontal scrolling

Open-Ended Responses: Narrative

• Avoid vertical scrolling when possible• Always avoid horizontal scrolling

Wells et al., 2012

32.8 characters 38.4 characters

~700 Rs

Open-Ended Responses: Numeric

• Is there a better way?

• Use of templates reduces ill-formed responses– E.g., $_________.00

Couper et al., 2009; Fuchs, 2007

Open-Ended Responses: Date

• Not a good use: intended response will always be the same format

• Same for state, zip code, etc. • Note

– “Month” = text– “mm/yyyy” = #s

Check Boxes and Radio Buttons

• Perceived Affordances• Design according to existing conventions and

expectations• What are the conventions?

Check Boxes: Select all that apply

Check Boxes in drop-down menus

Radio Buttons: Select only one

Radio Buttons: In grids

Radio Buttons on mobile

• Would something else be better?

Reducing Options

• What is necessary?

Placement of Instructions

• Place them near the item

• “Don’t make me think”

• Are they necessary?

Instructions

• Key info in first 2 sentences• People skim

– Rule of 2s: Key info in first two paragraphs, sentences, words

Instructions

Placement of Clarifying Instructions

• Help respondents have the same interpretation

• Definitions, instructions, examples

Conrad & Schober, 2000; Conrad et al., 2006; Conrad et al., 2007; Martin, 2002; Schober & Conrad, 1997; Tourangeau et al., 2010

Redline, 2013

• Percentage of valid responses was higher with clarification

• Longer response time when before item• No effects of changing the font style• Before item is better than after• Asking a series of questions is best

Redline, 2013

Placement of Help

• People are less likely to use help when they have to click than when it is near item

Placement of Error Message

• Should be near the item• Should be positive and helpful, suggesting

HOW to help• Bad error message:

Placement of Error Message

• Should be near the item• Should be positive and helpful, suggesting

HOW to help• Bad error message:

Error Message Across Devices

Graphics

• Improve motivation, engagement, satisfaction with “fun”

• Decrease nonresponse & measurement error• Improve data quality• Gamification

Henning, 2012; Manfreda et al., 2002

Graphics

• Use when they supply meaning– Survey about advertisements

• Use when user experience is improved– For children or video-game players– For low literacy

Libman, 2012

Graphics

http://glittle.org/smiley-slider/

http://www.ollie.net.nz/casestudies/smiley_slider/

Graphics Experiment 1.1

• Appearance– Decreasing boldness (bold faded)– Increasing boldness (faded bold)– Adding face symbols to response options ( )

• ~ 2400 respondents• Rated satisfaction re health-related things• 5-pt scale: very satisfied very dissatisfied

Medway & Tourangeau, 2011

Graphics Experiment 1.2• Bold side selected more

• Less satisfaction when face symbols present

Medway & Tourangeau, 2011

Very satisfied

Somewhat satisfied Neutral

Somewhat dissatisfied

Very dissatisfied

Your physician O O O O O

Very satisfied

Very dissatisfied

Your physician O O O O O

• Appearance– Radio buttons– Face symbols ( )

• ~ 1000 respondents• Rated satisfaction with a journal• 6-pt scale: very dissatisfied very satisfied

Emde & Fuchs, 2011

Graphics Experiment 2.2• Faces were equivalent to radio buttons• Respondents were more attentive when faces

were present– Time to respond

Emde & Fuchs, 2011

Slider Usability Study

• Participants thought 1 was selected and did not move the slider. 0 was actually selected if they did not respond.

Strohl, Romano Bergstrom & Krulikowski, 2012

• Modified the visual design of survey items– Increase novelty and interest on select items– Other items were standard

• ~ 100 respondents in experimental condition• ~ 1200 in control• Questions about military perceptions and

media usage• Variety of question types

Gibson, Luchman & Romano Bergstrom, 2013

• No differences

Graphics Experiment 3.3• Slight differences:

– Those with enhanced version skipped more often– Those in standard responded more negatively.

• Slight differences: – Those with enhanced version skipped more often

• No major differences

Graphics Considerations• Mixed results• “Ad blindness”• Internet speed and

download time• Unintended meaning

Graphics Considerations

Emphasizing Text

• Font– Never underline plain text– Never use red for plain text– Use bold and italics sparingly

Emphasizing Text

• Hypertext– Use meaningful

words and phrases– Be specific– Avoid “more” and

“click here.”

White Space

• White space on a page• Differentiates sections• Don’t overdo it

White Space

Authentication

• Ensures respondent is the selected person• Prevents entry by those not selected• Prevents multiple entries by selected

respondent

Authentication

• Passive– ID and password embedded in URL

• Active– E-mail entry– ID and password entry

• Avoid ambiguous passwords (Couper et al., 2001)– E.g., contains 1, l, 0, o

• Security concerns can be an issue• Don’t make it more difficult than it needs to be

Authentication

Progress Indicators

• Reduce breakoffs• Reduce burden by displaying length of survey• Enhance motivation and visual feedback• Not needed in scrolling design• Little evidence of benefit

Couper et al., 2001; Crawford et al., 2001; Conrad et al., 2003, 2005; Sakshaug & Crawford, 2009

Progress Indicators: At the bottom

Progress Indicators: At the top

Progress Indicators: Mobile

Progress Indicators

• They should provide meaning

Consistency• Predictable

– User can anticipate what the system will do• Dependable

– System fulfills user’s expectations• Habit-forming

– System encourages behavior• Transferable

– Habits in one context can transfer to another• Natural

– Consistent with user’s knowledge

Inconsistency

Questions and Discussion

Assessing Your Survey

Paradata• Background • Uses of Paradata by

mode• Paradata issues

Usability• Usability vs. User Experience• Why, When, What?• Methods

– Focus Groups, In-Depth Interviews

– Ethnographic Observations, Diary Studies

– Usability & Cognitive Testing• Lab, Remote, In-the-Field• Obstacles

Paradata

Types of Data

• Survey Data – collected information from R’s• Metadata – data that describes the survey

– Codebook– Description of the project/survey

• Paradata – data about the process of answering the survey at the R level

• Auxiliary/Administrative Data – not collected directly, but acquired from external sources

Paradata

• Term coined by Mick Couper– Originally described data that were by-products of

computer-assisted interviewing– Expanded to include data from other self-

administered modes • Main uses:

– Adaptive / Responsive design– Nonresponse adjustment– Measurement error identification

Total Survey Error Framework

Groves et al. 2004; Groves & Lyberg 2010

TSE Framework & Paradata

Krueter, 2012

Adaptive / Responsive Design

• Create process indicators • Real-time monitoring (charts & “dashboards”)• Adjust resources during data collection to

achieve higher response rate and/or cost savings• Goal:

– Achieve high response rates in a cost-effective way– Introduce methods to recruit uncooperative – and

possibly different – sample members (reducing nonresponse bias)

Nonresponse Adjustment

• Decreasing response rates have encouraged researchers to look at other sources of information to learn about nonrespondents– Doorstep interactions– Interviewer observations– Contact history data

Contact History Instrument (CHI)

• CHI developed by the U.S. Census Bureau (Bates, 2003)

• Interviewers take time after each attempt (refusal or non-contact) to answer questions in the CHI

• Use CHI information to create models (i.e., heat maps) to identify optimal contact time

• Typically a quick set of questions to answer

• European Social Survey uses a standard contact form (Stoop et al., 2003)

Contact History Inventory (CHI)

U.S. Census Bureau CHI

Paradata

• Background information about Paradata• Uses of Paradata by mode• Paradata issues

Uses of Paradata by Mode

• CAPI• CATI• Web• Mail• Post-hoc

Uses of Paradata - CAPI

• Information collected can include:– Interviewer time spent calling sampled households– Time driving to sample areas– Time conversing with household members– Interview time– GPS coordinates (tablets/mobile devices)

• Information can be used to: – Inform cost-quality decisions (Kreuter, 2009)– Develop cost per contact– Predict the likelihood of response by using interviewer

observations of the response unit (Groves & Couper, 1998)– Monitor interviewers and identify any falsification

Uses of Paradata - CATI

• Information collected can include:– Call transaction history (record of each attempt)– Contact rates– Sequence of contact attempts & contact rates

• Information can be used to: – Optimize call back times– Interviewer monitoring– Inform a responsive design

Uses of Paradata - Web

• Server-side vs. client-side• Information collected can include:

– Device information (i.e., browser type, operating system, screen resolution, detection of JavaScript or Flash)

– Questionnaire navigation information

Callegaro, 2012

Web Paradata - Server-side

• Page requests or “visits” to a web page from the web server

• Identify device information and monitor survey completion

Web Paradata - Server-side cont.

• Typology of response behaviors in web surveys1. Complete responders2. Unit non-responders3. Answering drop-outs4. Lurkers5. Lurking drop-outs6. Item non-responders7. Item non-responding drop-outs

Bosnjak, 2001

Web Paradata – Client-Side

• Collected on the R’s computer• Logs each “meaningful” action• Heerwegh (2003) developed code / guidance for client-

side paradata collected using Java-Script– Clicking on a radio button– Clicking and selecting a response option in a drop-down box– Clicking a check box (checking / unchecking)– Writing text in an input field– Clicking a hyperlink– Submitting the page

Web Paradata – Client-Side cont.

• Stern (2008) used Heerwegh’s paradata techniques to identify:– Whether R’s changed answers; what direction – The order that questions are answered when more

than one are displayed on the screen– Response latencies – the time that elapsed between

when the screen loaded on the R’s computer and they submitted an answer

• Heerwegh (2003) found that the longer the response time, the greater the probability of changing answers and an incorrect response

Browser Information / Operating System Information

• Programmers use this information to ensure they are developing the optimal design

• Desktop, laptop, smartphone, tablet, or other device• Sood (2011) found a correlation between browser

type and survey breakoff & number of missing items– Completion rates for older browsers were lower– Using browser type as a proxy for age of device and

possible connection speed– Older browsers were more likely to display survey

incorrectly; possible explanation for higher drop-out rates

JavaScript & Flash

• Helps to understand what the R can see and do in a survey

• JavaScript adds functionality such as question validations, auto-calculations, interactive help– 2% or less of computer users have JavaScript

disabled (Zakas, 2010)• Flash is used for question types such as drag &

drop or slide-bar questions– Without Flash installed, R’s may not see the question

Flash Question Example

Questionnaire Navigation Paradata

• Mouse clicks/coordinates– Captured with JavaScript– Excessive movements can indicate

• An issue with the question• Potential for lower quality

• Changing answers– Can indicate potential confusion with a question– Paradata can capture answers that were erased– Changes more frequent for opinion question than

factual questions

Stieger & Reips, 2010

Questionnaire Navigation Paradata cont.

• Order of answering– When multiple questions are displayed on a

screen– Can indicate how respondents read the questions

• Movement through the questionnaire (forward and back)– Unusual patterns can indicate confusion and a

possible issue with the questionnaire (i.e., poor question order)

• Number of prompts/error messages/data validation messages

• Quality Index (Haraldsen, 2005)

• Goal is to decrease number of activated errors by improving the visual design and clarity of the questions

• Clicks on non-question links– Help, FAQs, etc.– Indication of when and where Rs use help or other

information built into the survey and displayed as a link• Last question answered before dropping out

– Helps to determine if the data collected can be classified as complete, partial, or breakoff

– Used for response rate computation– Peytchev (2009) analyzed breakoff by question type

• Open ended increased break-off chances by 2.5x; long questions by 3x; slider bars by 5x; introductory screens by 2.6x

• Time per screen / time latency– Attitude strength– Response uncertainty– Response error

• Examples– Heerwegh (2003)

• R’s with weaker attitudes take more time in answering survey questions than R’s with stronger attitudes

– Yan and Tourangeu (2008)• Higher-educated R’s respond faster than lower-educated R’s• Younger R’s respond faster than older R’s.

Uses of Paradata – Call Centers

• Self-administered (mail or electronic) surveys• Call transaction history software

– Incoming calls• Date and time: useful for hiring, staffing, and workflow

decisions• Purpose of the call

– Content issue: useful for identifying problematic questions or support information

– Technical issue: useful for identifying usability issues or system problems

• Call outcome: type of assistance provided

Paradata

• Background information about Paradata• Uses of Paradata by mode• Paradata issues

Paradata Issues

• Reliability of data collected• Costs• Privacy and Ethical Issues

Reliability of data collected

• Interviewers can erroneously record housing unit characteristics, misjudge features about respondents & fail to record a contact attempt

• Web surveys can fail to load properly, and client-side paradata fails to be captured

• Recordings of interviewers can be unusable (e.g., background noise, loose microphones)

Casas-Cardero 2010; Sinibaldi 2010; West 2010

Paradata costs

• Data storage – very large files• Instrument performance• Development within systems• Analysis

Privacy and Ethical Issues

• IP addresses along with e-mail address or other information can be used to identify a respondent

• This information needs to be protected

Paradata Activity

• Should the respondent be informed that the organization is capturing paradata?

• If so, how should that be communicated?

Privacy and Ethical Issues cont.

• Singer & Couper asked members of the Dutch Longitudinal Internet Studies for the Social Sciences (LISS) panel at the end of the survey if they could collect paradata – 38.4% agreed

• Asked before the survey – 63.4% agreed• Evidence that asking permission to use

paradata might make R’s less willing to participate in a survey

Couper & Singer, 2011

Privacy and Ethical Issues cont.

• Reasons for failing to inform R’s about paradata or get their consent– Concept of paradata is unfamiliar and difficult for

R’s to grasp– R’s associate it with the activities of advertisers,

hackers or phishers– Asking for consent gives it more salience– Difficult to convey benefits of paradata for the R

Usability Assessment

Usability Assessment• Usability vs. User Experience• Why, When, What?• Methods

• Focus Groups, In-Depth Interviews• Ethnographic Observations, Diary Studies• Usability and Cognitive Testing

• Lab, Remote, In-the-Field• Obstacles

Background Knowledge

• What does usability mean to you?• Have you been involved in usability research?• How is “user experience” different from

“usability?”

Usability vs. User Experience

• Usability: “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use.” ISO 9241-11

• Usability.gov• User experience includes emotions, needs and

perceptions.

Understanding Users

Whitney’s 5 E’s of Usability Peter’s User Experience Honeycomb

The 5 Es to Understanding Users (W. Quesenbery): http://www.wqusability.com/articles/getting-started.htmlUser Experience Design (P. Morville): http://semanticstudios.com/publications/semantics/000029.php

User Experience

Measuring the UX

• How does it work for the end user?• What does the user expect?• How does it make the user feel?• What is the user’s story and habits?• What is the user’s needs?

What people do on the Web

Krug, S. Don’t Make Me Think

Why is Testing Important?

• Put it in the hands of the users.• Things may seem straightforward to you but

maybe not to your users.

maybe not to your users.• You might have overlooked something big!

When to test

Concept

Prototype

Final Product

Test with users

What can be tested?

• Existing surveys• Low-fidelity prototypes

– Paper mockups or mockups on computer– Basic idea is there but not functionality or

graphical look• High-fidelity prototypes

– As close as possible to final interface in look and feel

Methods to Understand Users

assess interactiona

lmotivations and goals

ensure users can use products

efficiently & w

satisfaction

ensure content is

understood as intended

assess emotions,

perceptions, and

reactions

randomly sample the

population of interest

understand interactions in natural

environment

discuss users’

perceptions and

reactions

Method

Linguistic Analysis

Usability Testing

Cognitive Testing

User Experienc

e Research

Surveys

Ethnographic

Observation

Focus Groups and In-Depth

Interviews

Assessment

Focus Groups

• Structured script

• Moderator discusses the survey with actual or typical users– Actual usage of survey

– Workflow beyond survey

– Expectations and opinions

– Desire for new features and functionality

• Benefit of participants stimulating conversations, but risk of “group think”

In-Depth Interviews

• Structured or unstructured • Talk one-on-one with users,

in person or remotely– Actual usage of the survey– Workflow beyond survey– Expectations and opinions– Desire for new features and

functionality

Ethnographic Observations• Observe users in home, office

or any place that is “real-world.”

• Observer is embedded in the user’s culture.

• Allows conversation & activity to evolve naturally, with minimum interference.

• Observe settings and artifacts (other real-world objects).

• Focused on context and meaning making.

Diaries/Journals

• Users are given a journal or a web site to complete on a regular basis (often daily).

• They record how/when they used the survey, what they did, and what their perceptions were.• User-defined data• Feedback/responses develop and change over time• Insight into how technology is used “on-the-go.”

• There is often a daily set of structured questions and/or free-form comments.

Diaries/Journals

• Users are given a journal or a web site to complete on a regular basis (often daily).

• They record how/when they used the survey, what they did, and what their perceptions were.• User-defined data• Feedback/responses develop and change over time• Insight into how technology is used “on-the-go.”

• There is often a daily set of structured questions and/or free-form comments.

Usability Testing

• Participants respond to survey items• Assess interface flow and design

• Understanding • Confusion• Expectations

• Ensure skip intricate response patterns work as intended

• Can test final product or early prototypes

Cognitive Testing

• Participants respond to survey items• Assess text

• Confusion• Understanding• Thought process

• Ensure questions are understood as intended and resulting data is valid

• Proper formatting is not necessary.

Usability vs. Cognitive Testing

Usability Testing Metrics• Accuracy

• In completing item/ survey• Number/severity of errors

• Efficiency• Time to complete item/survey• Path to complete item/survey

• Satisfaction • Item-based• Survey-based• Verbalizations

Cognitive Testing Metrics• Accuracy

• Of interpretations

• Verbalizations

Moderating TechniquesTechniques Pros Cons

Concurrent Think Aloud (CTA)

Understand participants’ thoughts as they occur and as they attempt to work through issues they encounter

Elicit real-time feedback and emotional responses

Can interfere with usability metrics, such as accuracy and time on task

Retrospective Think Aloud (RTA)

Does not interfere with usability metrics

Overall session length increases Difficulty in remembering

thoughts from up to an hour before = poor data

Concurrent Probing (CP)

Understand participants’ thoughts as they attempt to work through a task

Interferes with natural thought process and progression that participants would make on their own, if uninterrupted

Retrospective Probing (RP)

Does not interfere with usability metrics

Difficulty in remembering = poor data

Romano Bergstrom, Moderating Usability Tests: http://www.usability.gov/articles/2013/04/moderating-usability-tests.html

Choosing a Moderating Technique

• Can the participant work completely alone?• Will you need time on task and accuracy data?• Are the tasks multi layered and/or require

concentration?• Will you be conducting eye tracking?

Tweaking vs. Redesign

Tweaking• Less work• Small changes occur quickly.• Small changes are likely to

happen.

Redesign• Lots of work after much has

already been invested• May break something else• A lot of people• A lot of meetings

Lab vs. Remote vs. In the Field• Controlled

environment

• All participants have the same experience

• Record and communicate from control room

• Observers watch from control room and provide additional probes (via moderator) in real time

• Incorporate physiological measures (e.g., eye tracking, EDA)

• No travel costs

Laboratory Remote In the Field• Participants tend to

be more comfortable in their natural environments

• Recruit hard-to-reach populations (e.g., children, doctors)

• Moderator travels to various locations

• Bring equipment (e.g., eye tracker)

• Natural observations

• Participants in their natural environments (e.g., home, work)

• Use video chat (moderated sessions) or online programs (unmoderated)

• Conduct many sessions quickly

• Recruit participants in many locations (e.g., states, countries)

Lab-Based Usability Testing

Observation area for clientsWe maneuver the

cameras, record, and communicate through microphones and speakers from the control room so we do not interfere

Live streaming close-up screen shot of the participant’s screen

Participant in the testing room

Large screens to display material during focus groups

Fors Marsh Group UX Lab

Eye Tracking

• Desktop• Mobile• Paper

Remote Moderated Testing

Participant working on the survey from her home in another state

Moderator working from the office

Observer taking notes, remains unseen from participant

Field Studies

Participant is in her natural environment, completing tasks on a site she normally uses for work

Researcher goes to participant’s workplace to conduct session. She observes and takes notes

Participant uses books from her natural environment to complete tasks on the website

Obstacles to Testing

• “There is no time.”– Start early in development process.– One morning a month with 3 users (Krug)– 12 people in 3 days (Anderson Riemer)– 12 people in 2 days (Lebson & Romano Bergstrom)

• “I can’t find representative users.”– Everyone is important.– Travel– Remote testing

• “We don’t have a lab.”– You can test anywhere.

Final Thoughts

• Test across devices.• “User experience is an ecosystem.”

• Test across demographics.• Older adults perform differently than young.

• Start early.

Kornacki, 2013, The Long Tail of UX

Questions & Discussion

Quality of Mixed Modes

• Mixed Mode Surveys• Response Rates• Mode Choice

Mixed Mode Surveys

• Definition: Any combination of survey data collection methods/modes

• Mixed vs. Multi vs. Multiple – Modes • Survey organization goal:

– Identify optimal data collection procedure (for the research question)

– Reduce Total Survey Error– Stay within time/budget constraints

Mixed Mode Designs

• Sequential – Different modes for different phases of interaction

(initial contact, data collection, follow-up)– Different modes used in sequence during data

collection (i.e., panel survey which begins in one mode and moves to another)

• Concurrent – different modes implemented at the same time

de Lueew & Hox, 2008

Why Do Mixed Mode?

• Cost savings• Improve Timeliness• Reduces Total Survey Error

– Coverage error– Nonresponse error– Measurement error

Mixed Modes – Cost Savings

• Mixed mode designs give an opportunity to compensate for the weaknesses of each individual mode in a cost effective way (de Leeuw, 2005)

• Dillman 2009 Internet, Mail, and Mixed-Mode Surveys book:– Organizations often start with lower cost mode

and move to more expensive one• In the past: start with paper then do CATI or in person

nonresponse follow-up (NRFU)• Current: start with Internet then paper NRFU

Mixed Modes – Cost Savings cont.

• Examples: • U.S. Current Population Survey (CPS) – panel survey

– Initially does in-person interview and collects a telephone number

– Subsequent calls made via CATI to reduce cost• U.S. American Community Survey

– Phase 1: mail– Phase 2: CATI NRFU– Phase 3: face-to-face with a subsample of remaining

nonrespondents

Mixed Mode - Timeliness

• Collect responses more quickly• Examples:

– Current Employment Statistics (CES) offers 5 modes (Fax, Web, Touch-tone Data Entry, Electronic Data Interchange, & CATI) to facilitate timely monthly reporting

Why Do Mixed Mode?

• Cost savings• Improve Timeliness• Reduces Total Survey Error

– Coverage error– Nonresponse error– Measurement error

Total Survey Error Framework

Groves et al. 2004; Groves & Lyberg 2010

Mixed Mode - Coverage Error

• Definition: proportion of the target population that is not covered by the survey frame and the difference in the survey statistic between those covered and not covered

• Telephone penetration• Landlines vs mobile phones

– Web penetration

Groves, 1989

Coverage – Telephone

• 88% of U.S. adults have a cell phone• Young adults, those with lower education, and

lower household income more likely to use mobile devices as main source of internet access

Smith, 2011; Zickuhr & Smith, 2012

Coverage - Internet

• Coverage is limited– No systematic directory of addresses

• 1 in 5 in U.S. do not use the Internet

Zickuhr & Smith, 2012

World Internet Statistics

Coverage –Web cont.

• Indications that Internet adoption rates have leveled off

• Demographics least likely to have Internet– Older – Less education– Lower household income

• Main reason for not going online: not relevant

Pew, 2012

European Union – Characteristics of Internet Users

Coverage - Web cont.

• R’s reporting via Internet can be different from those reporting via other modes – Internet vs. mail (Diment & Garret-Jones, 2007;

Zhang, 2000)• R’s cannot be contacted through the Internet

because e-mail addresses lack structure for generating random samples (Dillman, 2009)

Mixed Mode – Nonresponse Error

• Definition: inability to obtain complete measurements on the survey sample (Groves, 1998)– Unit nonresponse - entire sampling unit fails to

respond– Item nonresponse – R’s fail to respond to all

questions• Concern is that respondents and non-

respondents may differ on variable of interest

Mixed Mode – Nonresponse cont.

• Overall response rates have been declining• Mixed mode is a strategy used to increase

overall response rates while keeping costs low • Some R’s have a mode preference (Miller,

Mixed Mode – Nonresponse cont.

• Some evidence of a reduction in overall response rates when multiple modes offered concurrently in population/household surveys– Examples: Delivery Sequence File Study (Dillman,

2009); Arbitron Radio Diaries (Gentry, 2008), American Communities Survey (Griffen, et al, 2001), Survey of Doctorate Recipients (Grigorian & Hoffer, 2008)

• Could assign R’s to modes based on known preferences

Mixed Mode – Measurement Error

• Definition: “observational errors” arising from the interviewer, instrument, mode of communication, or respondent (Groves, 1998)

• Providing mixed modes can help reduce the measurement error associated with collecting sensitive information– Example: Interviewer begins face-to-face interview

(CAPI) then lets R continue on the computer with headphones (ACASI) to answer sensitive questions

Mode Comparison Research

• Meta-analysis of articles by – Harder to get mail responses– Overall non-response rates & item non-response

rates are higher in self-administered questionnaires, BUT answered items are of high quality

– Small difference in quality between face-to-face and telephone (CATI) surveys.

– Face-to-face surveys had slightly less item non-response rates

de Leeuw, 1992

Mode Comparison Research cont.

• Question order and response order effects less likely in self-administered than telephone – R’s more likely to choose last option heard in CATI

(recency effect)– R’s more likely to choose the first option seen in

self-administered (primacy effect)– Mixed results on item-nonresponse rates in Web

de Leeuw, 1992; 2008

Mode Comparison Research cont.

• Some indication that Internet surveys are more like mail than telephone surveys– Visual presentation vs auditory

• Conflicting evidence item non-response (some show higher item non-response on Internet v.s. mail while others show no difference)

• Some evidence of better quality data– Fewer post-data collection edits needed for

electronic v.s. mail responses

Sweet & Ramos, 1995; Griffin et. al, 2001

Disadvantages of Mixed Mode

• Mode Effects– Concerns for measurement error due to the mode

• R’s providing different answers to the same questions displayed in different modes

– Different contact/cooperation rates because of different strategies used to contact R’s

Disadvantages of Mixed Mode

• Decrease in overall response rates– Why: Effects of offering a mix of mail/web mixed– What: Meta-analysis of 16 studies that compared

mixed mode surveys with mail and web options

Results: empirical evidence that offering mail and Web concurrently resulted in a significant reduction in response rates

Medway & Fulton, 2012

• Why this is happening? – Potential Hypothesis #1: R’s dissuaded from responding

because they have to make a choice• Offering multiple modes increases burden (Dhar, 1997)• While judging pros/cons of each mode, neither appear attractive

(Schwartz, 2004)

– Potential Hypothesis #2: R’s choose Web, but never actually do it

• If R’s receive invitation in mail, there is a break in their response process (Griffin, et. al, 2001)

– Potential Hypothesis #3: R’s that choose Web may get frustrated with the instrument and abandon the whole process (Couper, 2000)

Response Rates in Mixed Mode Surveys

Overall Goals

• Find the optimal mix given the research questions and population of interest

• Other factors to consider: – Reducing Total Survey Error (TSE)– Budget– Time– Ethics and/or privacy issues

Biemer & Lyberg, 2003

Technique for Increasing Response Rates to Web in Multi-Mode Surveys

• “Pushing” R’s to the web– Sending R’s an invitation to report via Web– No paper questionnaire in the initial mailing– Invitation contains information for obtaining the

alternative version (typically paper)– Paper versions are mailed out during follow-up to

capture responses to those that do not have web access or do not want to respond via web

– “Responding to Mode in Hand” Principal

“Pushing” Examples

• Example 1: Lewiston-Clarkson Quality of Life Survey

• Example 2: 2007 Stockholm County Council Public Health Survey

• Example 3: American Community Survey• Example 4: 2011 Economic Census Re-file

Survey

Pushing Example 1 – Lewiston-Clarkson Quality of Life Survey

• Goals: increase web response rates in a paper/web mixed-mode survey and identify mode preferences

• Method: – November 2007 – January 2008– Random sample of 1,800 residential addresses– Four treatment groups– To assess mode preference, this question was at the end of

the survey: • “If you could choose how to answer surveys like this, which one

of the following ways of answering would you prefer?”• Answer options: web or mail or telephone

Miller, O’Neill, Dillman, 2009

Pushing Example 1 – cont.

• Group A: Mail preference with web option– Materials suggested mail was preferred but web

was acceptable• Group B: Mail Preference

– Web option not mentioned until first follow-up• Group C: Web Preference

– Mail option not mentioned until first follow-up• Group D: Equal Preference

• Results

“If you could choose how to answer surveys like this, which one of the following ways of answering would you prefer?”

Group C = Web Preference Group

• Who can be pushed to the Web?

Pushing Example 2 – 2007 Stockholm County Council Public Health Survey

• Goal: increase web response rates in a paper/web mixed-mode survey

• Method:– 50,000 (62% response rate)– 4 treatments that varied in “web intensity”– Plus a “standard” option – paper and web login

Holmberg, Lorenc, Werner, 2008

Pushing Example 2 – Cont.

• Overall response rates

S= Standard A1= very paper “intense”A2= paper “intense” A3= web “intense”A4= very web “intense”

• Web responses

S= Standard A1= very paper “intense”A2= paper “intense”A3= web “intense”A4= very web “intense”

Pushing Example 3 – American Community Survey

• Goals: – Increase web response rates in a paper/web

mixed-mode survey– Identify ideal timing for non-response follow-up– Evaluate advertisement of web choice

Tancreto et. al., 2012

• Method– Push: 3 versus 2 weeks until paper questionnaire– Choice: Prominent and Subtle– Mail only (control)

– Tested among segments of US population

• Targeted• Not Targeted

Response Rates by Mode in Targeted Areas

Ctrl (Mail only) Prom Choice Subtle Choice Push (3 weeks) Push (2 weeks)0

28.434.1

9.83.5

28.6 28.0

Internet Mail

37.238.1

37.638.2

Response Rates by Mode in Not Targeted Areas

Ctrl (Mail only) Prom Choice Subtle Choice Push (3 weeks) Push (2 weeks)0

29.724.1

27.82.7

6.32.1

17.1 17.2

Internet Mail

30.4 29.9 29.8

Example 4: Economic Census Refile

• Goal: to increase Internet response rates in a paper/Internet establishment survey during non-response follow-up

• Method: 29,000 delinquent respondents were split between two NRFU mailings– Letter-only mailing mentioning Internet option– Letter and paper form mailing

Marquette, 2012

Example 4: Cont.

Why Respondents Choose Their Mode?

• Concern about “mode paralysis” – When two option are offered, R’s much choose

between tradeoffs– This choice makes each option less appealing – By offering a choice between Web and mail;

possibly discouraging response

Miller and Dillman, 2011

Mode Choice

• American Community Survey – Attitudes and Behavior Study

• Goals: – Measure why respondents chose the Internet or

paper mode during the American Community Survey Internet Test

– Why there was nonresponse and if it was linked to the multi-mode offer

Nichols, 2012

Mode Choice – cont.

• CATI questionnaire was developed in consultation with survey methodologists

• Areas of interest included:– Salience of the mailing materials and messages– Knowledge of the mode choice– Consideration of reporting by Internet– Mode preference

• 100 completed interviews per notification strategy (push

• Results– Choice/Push Internet respondents opted for perceived

benefits – easy, convenient, fast– Push R’s noted that not having the paper form

motivated them to use the Internet to report– Push R’s that reported via mail did so because they did

not have the Internet access or had computer problems– The placement of the message about the Internet

option was reasonable to R’s– R’s often recalled the letter that accompanied the

mailing package mentioning the mode choice

• Results cont.– Several nonrespondents cited not knowing that a

paper option was available as a reason for not reporting

– Very few nonrespondents attempted to access the online form

– Salience of the mailing package and being busy were main reasons for nonresponse

– ABS study did NOT find “mode paralysis”

Amy Anderson RiemerUS Census Bureau

amy.e.anderson.riemer@census.gov

Jennifer Romano BergstromFors Marsh Group

jbergstrom@forsmarshgroup.com

introduction to web survey usability design and testing

Education

engaging users through usability testing · 2018-04-14 ·...

usability testing report

articles+ usability testing

overseas usability testing

usability testing. what is usability testing for?

usability testing for survey research:how to and best...

2014 organic survey – promotional materials usability...

website usability testing - legal services national ......

usability testing materials

hallway usability testing, first fridays usability program

usability testing infographic

medical device usability testing pdf -...

236: ii'nmi usability testing. what is usability testing?...

research report series (survey methodology # 2018-12) ·...

remote usability testing

usability testing

usability & usability testing

introduction to usability testing for survey research

web survey and forms usability design & testing

synchronous remote usability testing: a new approach ... ›...