Methodology
Overview
Dr. Saul GreenbergJohn Kelleher
2
Role of Evaluation
• Star Life Cycle (Hix and Hartson)
ImplementationImplementation Task analysis/functional analysis
Task analysis/functional analysis
EvaluationEvaluation
Conceptual design/formal design
Conceptual design/formal design
Requirementsspecification
RequirementsspecificationPrototypingPrototyping
3
4
Goals of Evaluation
Assess the extent of system’s functionality incorporating ‘articulatory directness’
Assess the effect of the I/F on user learnability, users’ attitudes
Identify any specific problems with the system typified by lack of context awareness regarding
‘real-world’
5
When to evaluate? Initial stages
evaluate choices of initial design ideas and representations is the representation appropriate? does it reflect how people think of their task
Iterative development fine tune the interface, looking for usability bugs
can people use this system? Evaluation produces:
user reaction to design validation and list of problem areas (bugs) new design ideas
6
Why Evaluate? (contd.) Post-design
acceptance test: did we deliver on promises? verify that human/computer system meets expected performance
criteria ease of learning, usability, user’s attitude, performance criteria E.g., a first time user will take 1-3 minutes to learn how to withdraw
$50. from the automatic teller revisions: what do we need to change? effects: What did we change in the way people do their tasks?
Evaluation produces testable usability metrics actual reactions validation and list of problem areas (bugs) changes in original work practices/requirements
7
Generalized knowledge are there general design principles? are there theories? Can we validate ideas / visions / hypotheses?
Evaluation produces: principles and guidelines evidence supporting/rejection theories
Why Evaluate? (contd.)
8
Two forms of Evaluation Formative
continuous process throughout the design expect a number of iterations requires only a few users each time must be structured but also informal
Don't just ask users to try system! Average of 3 major ‘design-test-redesign’ cycles
Summative typically, at the end of the process a “go-no go” decision – quality control issue should involve many users must be structured relative to usability (and other) requirements
Formal, often involving statistical inferences
9
Evaluating the 1984 OMS
Early tests of printed scenarios & user guides Early simulations of telephone keypad An Olympian joined team to provide feedback Interviews & demos with Olympians outside US Overseas interface tests with friends and family. Free coffee and donut tests Usability tests with 100 participants. A ‘try to destroy it’ test Pre-Olympic field-test at an international event Reliability of the system with heavy traffic
10
Why Should We Use Different Methods? Information requirements differ
pre-design, iterative design, post-design, generalisable knowledge…
Information produced differs outputs should match the particular problem/needs
Cost/benefit of using method cost of method should match the benefit gained from the result,
One method’s strength can complement another’s weakness no one method can address all situations
Constraints may force you to chose quick and dirty discount usability methods
11
How Can We Compare Methods? Relevance
does the method provide information to our question / problem?
Naturalistic: is the method applied in an ecologically valid situation?
observations reflect real world settings: real environment, real tasks, real people, real motivation
Setting field vs laboratory?
Support are there tools for supporting the method and analyzing the
data?
12
How Can We Compare Methods? Quickness
can I do a good job with this method within my time constraints? Cost
Is the cost of this method reasonable for my question? Equipment
What special equipment / resources required? Personnel, training and expertise
What people are required to run this method? What expertise must they have?
Subject selection how many subjects do I need, who do they have to be, and can I
get them?
13
Usability Metrics time to complete a task time spent in error time spent using help or documentation ratio of success and failure number of commands used number of times the user is “misled” by the interface number of times the user expresses frustration or satisfaction number of good or bad features recalled by the user time to reach a specific level of performance frequency and type of errors made by users user reaction times to particular tasks frequency of command usage, e.g. help
14
15
What methods are there?
Lab tests Experimental methodologies
highly controlled observations and measurements to answer very specific questions i.e., hypothesis testing
Usability testing mostly qualitative, less controlled observations of users
performing tasks Interface inspection
Heuristic Evaluation Walkthroughs
experts and others analyze an interface by considering what a user would have to do a step at a time while performing their task
16
Field studies Ethnography
field worker immerses themselves in a culture to understand what that culture is doing
Contextual inquiry interview methodology that gains knowledge of what
people do in their real-world context
What methods are there?
17
What methods are there?Cognitive Modeling Fitt’s Law
Used to predict a user’s time to select a target
Keystroke-Level Model low-level description of what users would
have to do to perform a task. GOMS
structured, multi-level description of what users would have to do to perform a task
Self Reporting questionnaires surveys
18
Comparative Study of Big 4
User Interface Evaluation in the Real World: A Comparison of Four Techniques (Jeffries et al. 1991) Heuristic Evaluation (HE) Software Guidelines (SG) Cognitive Walkthrough (CW) Usability Testing (UT)
19
Problem Identification
Source: User Interface Evaluation in the Real World: A Comparison of Four Techniques (Jeffries, Miller, Wharton, Uyeda): HP 1991.
20
Cost-Benefit Analysis
21
Summary of Results