Chapter 14
Usability testing and field studies
Usability testing
Goal: to test whether the product being developed is usable by the intended user population to achieve the tasks for which it was designed
Key characteristics: Controlled environment Users’ performance measures on pre-planned
tasks Key data collection methods: user testing & user
satisfaction questionnaire
Usability testing
User testing Measure human performance on specific tasks,
e.g. reaction time such as pressing a key when a light first appears
Example of tasks: Reading different typefaces (e.g. Helvetica and Times) Navigating through different menu types (e.g. context
vs. cascade) Information searching
Logging of keystrokes and mouse movements, and video recordings
Usability testing
Examples of performance measures: Time to complete a task Time to complete a task after a specified time
away from the product Number and type of errors per task Number of navigations to online help or manuals Number of users making a particular error Number of users completing a task successfully
Usability testing
User satisfaction questionnaire To find out how users feel about using the product, through
asking them to rate it along a number of scales
Structured or semi-structured interviews may also be conducted with users
5-12 users is an acceptable number, fewer is possible considering time and budget constraints
Usability testing
Usability laboratory Testing laboratory Recording equipment (hand movements, facial
expression, general body language, utterances) Product being tested Observation room Maybe arranged to mimic the real world setting,
e.g. office environment Keep users away from normal sources of
distraction
Usability testing
Usability lab can be expensive
Alternatives are Mobile usability testing equipment Remote usability testing
Usability lab with observers watching a user & assistant
From: www.id-book.com
Portable equipment for use in the field
From: www.id-book.com
Conducting experiments in usability testing
Experiments – testing a specific hypothesis “Context menus are easier to select options from
compared with cascading menus” “Reading text displayed in 12-point Helvetica font
is faster than reading text displayed in 12-point Times New Roman”
Hypotheses are often based on a theory or previous research findings
Conducting experiments in usability testing
A hypothesis examines a relationship between two things, called variables
An independent variable is what the investigator ‘manipulates’ (i.e. selects)
A dependent variable depends on the independent variable
Conducting experiments in usability testing
Null hypothesis Example: There is no difference between
Helvetica and Times font on reading time Alternative hypothesis
Example: There is a difference between the two on reading time (two-tailed hypothesis)
Example: Helvetica is easier to read than Times (one-tailed hypothesis)
Conducting experiments in usability testing
Experimental design – keep other variables constant to prevent them from influencing the findings Example: color of text and screen resolution
Sometimes, an experimenter might want to investigate the relationship between two independent variables Example: age and educational background
Considerations in experimental design
Number of independent variables Assigning a participant to which condition
Different-participant design (between-subjects) Same-participant design (within-subjects) Matched-pairs design
Experimental designs Different participants - single group of
participants is allocated randomly to the experimental conditions.
Same participants - all participants appear in all conditions.
Matched participants - participants are matched in pairs, e.g., based on expertise, gender, etc.
From: www.id-book.com
Different, same, matched participant designDesign Advantages Disadvantages
Different No order or training effects
Many subjects & individual differences a problem
Same Few individuals, no individual differences
Counter-balancing needed because of ordering effects
Matched Same as different participants but individual differences reduced
Cannot be sure of perfect matching on all differences
From: www.id-book.com
Field studies Field studies are done in natural settings.
The aim is to understand what users do naturally and how technology impacts them.
Field studies can be used in product design to:- identify opportunities for new technology;- determine design requirements; - decide how best to introduce new technology;- evaluate technology in use.
From: www.id-book.com
Data collection & analysis
Observation & interviews Notes, pictures, recordings Video Logging
Analyzes Categorized Categories can be provided by theory
Grounded theory Activity theory
From: www.id-book.com
Key points Testing is a central part of usability testing. Usability testing is done in controlled conditions. Usability testing is an adapted form of experimentation. Experiments aim to test hypotheses by manipulating certain
variables while keeping others constant. The experimenter controls the independent variable(s) but not
the dependent variable(s). There are three types of experimental design: different-
participants, same- participants, & matched participants. Field studies are done in natural environments. Typically observation and interviews are used to collect field
studies data. Categorization and theory-based techniques are used to analyze
the data.
From: www.id-book.com
Chapter 15
Analytical Evaluation
Outline
Inspections: heuristic evaluation Inspections: walkthroughs Predictive models
Inspections: heuristic evaluation
Experts Examine the interface of an interactive product Role-play typical users Suggest problems users would have when
interacting with the product
Heuristic evaluation
Usability inspection technique First developed by Jakob Nielsen and
colleagues Experts are guided by a set of usability
principles known as heuristics Experts evaluate whether user-interface
elements (dialog boxes, menus, etc.) conform to the principles
Nielsen’s Heuristics
Visibility of system status Information about what is going on
Match between system and the real world Familiar terms, concepts, and conventions
User control and freedom Support undo and redo
Consistency and standards Words should have consistent meanings
Nielsen’s Heuristics
Error prevention Errors should be prevented from occurring in the
first place Recognition rather than recall
Reduce users’ memory load Flexibility and efficiency of use
Allow users to tailor frequent actions Aesthetic and minimalist design
Present relevant information
Nielsen’s Heuristics
Help users recognize, diagnose, and recover from errors Comprehensible error messages
Help and documentation Provide help information – easily accessible,
focus on users’ task, list concrete steps, not too large
Heuristics
Evaluators and researchers have typically developed their own heuristics
Most sets of heuristics have between five and ten items
Between 3 and 5 evaluators are recommended
Turn design guidelines into heuristics - websites
Guideline (G): Avoid orphan pages Heuristic (H): Are there any orphan pages?
Where do they go to?
G: Avoid long pages with excessive white spaces
H: Are there any long pages? Do they have lots of white space?
Turn design guidelines into heuristics
G: Provide navigation support H: Is there any guidance, e.g. maps,
navigation bar, menus, to help users find their way around the site?
G: Avoid non-standard link colors H: What color is used for links? Is it blue or
another color? If it is another color, is it obvious to the user that it is a hyperlink?
Heuristics for web-based online communities
Sociability: Why should I join this community? What are the benefits for me? Does the description of the group, its name, etc.
tell me about the purpose of the community and entice me to join it?
Usability: How do I join (or leave) the community? What do I do? Do I have to register?
Heuristics for web-based online communities
Sociability: Is the community safe? Are my comments treated with respect? Is my personal information secure?
Usability: How do I get, read, and send messages? Is there support for newcomers? Is it clear what I should do? Can I send private messages?
Two important aspects
1) Different types of applications need to be evaluated using different heuristics
2) The method by which they are derived needs to be reliable
Doing heuristic evaluation
1) briefing session The experts are told what to do
2) evaluation period Each expert spends 1-2 hours independently
inspecting the product, using heuristics for guidance
Doing heuristic evaluation
2) evaluation period Take at least two passes through the interface
First pass gives a feel for the flow of the interaction and the product’s scope
Second pass allows the evaluator to focus on specific interface elements and to identify potential usability problems
If evaluating a functioning product, specific user tasks should be used
Self note-taking, thinking aloud, a second person recording notes
Doing heuristic evaluation
3) debriefing session Discuss findings Prioritize problems Suggest solutions
Advantages and problems
Few ethical & practical issues to consider because users not involved.
Can be difficult & expensive to find experts. Best experts have knowledge of application
domain & users. Biggest problems:
Important problems may get missed; Many trivial problems are often identified; Experts have biases.
From: www.id-book.com
Inspection: walkthroughs
Walking through a task with the system and noting problematic usability features
Most walkthrough techniques do not involve users
Pluralistic walkthroughs involve a team (users, developers, and usability specialists)
Cognitive walkthroughs
Focus on ease of learning. Designer presents an aspect of the design
& usage scenarios. Expert is told the assumptions about user
population, context of use, task details. One or more experts walk through the
design prototype with the scenario. Experts are guided by 3 questions.
From: www.id-book.com
The 3 questions Will the correct action be sufficiently evident to
the user? (know what to do) Will the user notice that the correct action is
available? (see how to do it) Will the user associate and interpret the
response from the action correctly? (understand from feedback whether the action was correct or not)
As the experts work through the scenario, they note problems.
From: www.id-book.com
Pluralistic walkthrough
Variation on the cognitive walkthrough theme. Performed by a carefully managed team. The panel of experts begins by working separately. Then there is managed discussion that leads to
agreed decisions. The approach lends itself well to participatory
design.
From: www.id-book.com
Predictive models
Experts use formulas to derive various measures of user performance
Provide estimates of the efficiency of different systems for various kinds of tasks
Well-known predictive modeling technique – GOMS – family of models
Usefulness limited to systems with predictable tasks - e.g., telephone answering systems, mobiles, cell phones, etc.
Based on expert error-free behavior.
GOMS
Model knowledge and cognitive processes involved when interacting with the system
Goals - the state the user wants to achieve e.g., find a website.
Operators - the cognitive processes & physical actions needed to attain the goals
Methods - the procedures for accomplishing the goals
Selection rules - decide which method to select when there is more than one.
From: www.id-book.com
GOMS - example
Goal: delete a word in a sentence Method
Using menu option 1) Recall that word to be deleted has to be highlighted 2) Recall that command is ‘cut’ 3) Recall that command ‘cut’ is in edit menu 4) Accomplish goal of selecting and executing the ‘cut’
command 5) Return with goal accomplished
GOMS - example
Method: Using delete key
1) Recall where to position cursor in relation to word to be deleted
2) Recall which key is delete key 3) Press ‘delete’ key to delete each letter 4) Return with goal accomplished
GOMS - example
Operators: Click mouse Drag cursor over text Select menu Move cursor to command Press keyboard key
GOMS - example
Selection rules: 1. Delete text using mouse and selecting from
menu if large amount of text is to be deleted 2. Delete text using delete key if small number of
letters are to be deleted
Keystroke level model
Provide actual numerical predictions of user performance
The keystroke model allows predictions to be made about how long it takes an expert user to perform a task.
Response times for keystroke level operators (Card et al., 1983)
Operator Description Time (sec) K Pressing a single key or button
Average skilled typist (55 wpm) Average non-skilled typist (40 wpm) Pressing shift or control key Typist unfamiliar with the keyboard
0.22 0.28 0.08 1.20
P P1
Pointing with a mouse or other device on a display to select an object. This value is derived from Fitts’ Law which is discussed below. Clicking the mouse or similar device
0.40 0.20
H Bring ‘home’ hands on the keyboard or other device
0.40
M Mentally prepare/respond 1.35 R(t) The response time is counted only if it
causes the user to wait. t
From: www.id-book.com
GOMS: Advantages
Advantages: Allow comparative analyses to be performed for
different interfaces, prototypes, or specifications relatively easily
Help make decisions about effectiveness of new products
GOMS: Disadvantages
Disadvantages: not often used for evaluation Highly limited scope – only model computer-
based tasks (routine data-entry type tasks) Only predict expert performance, not allow for
errors to be modeled Only make predictions about predictable behavior
Fitts’ Law (Fitts, 1954)
Fitts’ Law predicts that the time to point at an object using a device is a function of the distance from the target object & the object’s size.
The further away & the smaller the object, the longer the time to locate it and point to it.
Fitts’ Law is useful for evaluating systems for which the time to locate an object is important, e.g., a cell phone,a handheld devices.
From: www.id-book.com
Key points
• Expert evaluation: heuristic & walkthroughs.• Relatively inexpensive because no users.• Heuristic evaluation relatively easy to learn.• May miss key problems & identify false ones.• Predictive models are used to evaluate systems
with predictable tasks such as telephones.• GOMS, Keystroke Level Model, & Fitts’ Law
predict expert, error-free performance.
From: www.id-book.com