evaluation how do we test the interaction design? several dimensions –qualitative vs. quantitative...
TRANSCRIPT
Evaluation
• How do we test the interaction design?
• Several Dimensions– Qualitative vs. Quantitative assessments– Conceptual vs. Physical Design
Why Evaluate
• Five good reasons:– Problems fixed before product is shipped– Team can concentrate on real (not imaginary)
problems– Engineers can develop code instead of debating
about their personal preferences– Time to market is sharply reduced– Solid, tested design will sell better
When to Evaluate
• Formative Evaluations– Conducted during requirements specification and
design– Consider alternatives
• Summative Evaluations– Assess the success of a finished product– Determine whether product satisfies requirements
What to Evaluate
• A huge variety of User Interaction features can (and should) be evaluated, such as:– Sequence of links in a web search
– Enjoyment experienced by game users
– System response time
– Signal detection performance in data analysis
• Gould’s principles:– Focus on users and their tasks
– Observe, measure, and analyze user performance
– Design iteratively
Qualitative Assessment
• Informal– Simply ask users how they like the system– Listen to “hallway” conversations about systems
• Formal– Develop survey instruments to ask specific
questions, e.g.• How long did it take you to become comfortable?
• Which task is the most difficult to accomplish?
– Hold focus group discussions about system
Quantitative Assessment
• Identify Usability Criteria (from Requirements) to test
• Design human performance experiments to test these, e.g.– Measure response time or time to complete a task– Measure error rate or incidence of “dead end”
• This can be used during the design process to compare alternative designs
An Evaluation Framework
• Evaluation must be an intentional, planned process– ad hoc evaluations are of very little value
• The details of the particular framework can vary from team to team
• What is important is that the framework be crafted in advance, and that all team members understand the framework
Evaluation Paradigms
• Evaluation Paradigms are the beliefs and practices (perhaps underpinned by theory) that guide a user study
• We’ll discuss four core evaluation paradigms:– Quick and Dirty evaluation– Usability Testing– Field Studies– Predictive Evaluation
Quick and Dirty
• Informal feedback from users
• Can be conducted at any stage
• Emphasis is on speed, not quality
• Often consultants are used as surrogate users
Usability Testing
• Measuring typical users’ performance on carefully prepared tasks that are typical for the system
• Metrics can include such things as– Error rate and time to completion– Observations/recordings/logs of interaction– Questionnaires
• Strongly controlled by the evaluator
What is usability?• An Operational Definition
– Efficient– Effective– Safe– Easy
• To learn• To remember• To use
– Productive• As well as
– Satisfying– Enjoyable– Pleasing– Motivating– Fulfilling
Field Studies
• Done in natural settings
• Try to learn what users do and how• Artifacts are collected
– Video, notes, sketches, &c
• Two approaches:– As an outsider looking on
• Qualitative techniques used to gather data
• Which may be analyzed qualitatively or quantitatively
– As an insider• Easier to capture role of social environment
Predictive Evaluation
• Uses models of typical users– Heuristic or theoretical
• Users themselves need not be present– Cheaper, faster
• Tried and true heuristics can be useful– E.g. speak the users’ language
Evaluation Techniques
• Observing users
• Asking users their opinions
• Asking experts their opinions
• Testing users’ performance
• Modeling users’ task performance to predict efficacy of the interface
Techniques vs. Paradigms
Models used to predict efficacy
N/AN/AN/AModeling user’s performance
N/ACan measure performance, but difficult
Test typical users, typical
tasks
N/AUser testing
Heuristics early in design
N/AN/AProvide critiques
Asking experts
N/AMay interviewQuestionnaires & interviews
Discussions, focus groups
Asking users
N/ACentral technique
Video and logging
See how users behave
Observing Users
PredictiveField StudiesUsability Testing
Quick and DirtyTechniques
Evaluation Paradigms
DECIDE
• Determine the overall goals that the evaluation addresses
• Explore the specific questions to be answered• Choose the evaluation paradigm and
techniques• Identify the practical issues that must be
addressed• Decide how to deal with the ethical issues• Evaluate, interpret, and present the data
Determine the goals
• What are the high-level goals of the evaluation?
• Who wants the evaluation and why?• Should guide the evaluation, i.e.:
– Check that evaluators understood users’ needs
– Identify the metaphor under the design
– Ensure that interface is consistent
– Investigate degree to which technology influences working practices
– Identify how the interface of an existing product could be engineered to improve its usability
Explore the questions
• This amounts to hierarchical question development:– “Is the user interface good?”
• “Is the system easy to learn?”– “Are functions easy to find?”
– “Is the terminology confusing?”
• “Is response time too slow?”– “Is login time too slow?”
– “Is calculation time too slow?”
– …
Choose the evaluation paradigm and techniques
• Choosing one or more evaluation paradigms– Can use different paradigms in different stages– Can use multiple paradigms in a single stage
• Combinations of techniques can be used to obtain different perspectives
Identify the practical issues
• Practical issues must be considered BEFORE beginning any evaluation– Users
• Adequate number of representative users must be found
– Facilities and equipment• How many cameras? Where? Film?
– Schedule and budget• Both always less than would be ideal
– Expertise• Assemble the correct evaluation team
Decide how to deal with ethical issues
• Experiments involving humans must be conducted within strict ethical guidelines– Tell participants the goals and what will happen
– Explain that personal information is confidential
– They’re free to stop at any time
– Pay subjects when possible: formal relationship
– Avoid using quotes that reveal identity
– Ask users’ permission to quote them, show them the report
• Example: Yale shock experiment 1961-2
Evaluate, interpret and present the data
• What data to collect, how to analyze them• Questions need to be asked
– Reliability: is it reproducible?– Validity: measures what it’s supposed to– Biases: biases cause distortion– Scope: how generalizable?– Ecological validity: how important is the
evaluation environment – does it match the real environment of interest?
Observing Users
• Ethnography – observing the social environment and recording observations which help to understand the function and needs of the people in it
• Users can be observed in controlled laboratory conditions or in natural environments in which the products are used – i.e. the field
Goals, questions and paradigms
• Goals and questions should guide all evaluation studies
• Ideally, these are written down
• Goals help to guide the observation because there is always so much going on
What and when to observe
• Insider or outsider?
• Laboratory or field?– Control vs. realism
• What times are critical times (especially for field observations)?
Approaches to observation
• Quick and dirty– Informal
– Insider or outsider
• Observation in usability testing– Formal
– Video, interaction logs, performance data
– Outsider
• Observation in field studies– Outsider, participant, or ethnographer (participant or not?)
In controlled environments
• Decide location– Temporary lab in user’s environment?– Remote laboratory?
• Equipment
• Hard to know what user is thinking– “Think Aloud” technique– But speaking can alter the interaction– Having two subjects work together can help
In the field
• Who is present?– What are their roles?
• What is happening?– Include body language, tone
• When does activity occur?
• Where is it happening?
• Why is it happening
• How is the activity organized?
Participant observation and ethnography
• In this case, the observer/evaluator must be accepted into the group
• Honesty about purpose is important both ethically and to gain trust
• Disagreement in the field about the distinction between ethnography and participant observation– Do ethnographers begin with any assumptions?
Analyzing, interpreting, and presenting data
• Observation produces large quantities of data of various types
• How to analyze and interpret depends on the research questions first developed
Qualitative analysis to tell a story
• The ensemble of data (notes, video, diaries, &c) are used to help designers, as a team, understand the users
• There is much room for evaluator bias in these techniques
Qualitative analysis for categorization
• A taxonomy can be developed into which user’s behaviors can be placed
• This can be done by different observers, with the discrepancies used as a measure of observer bias
Quantitative data analysis
• Observations, interaction logs, and results are gathered and quantified– Counted– Measured
• Analysis using statistical reasoning can be used to draw conclusions– What is statistical significance?– What is a T-test?
Feeding the findings back into design
• Ideally, the design team will participate in post-evaluation discussions of qualitative data
• Reports to designers should include artifacts, such as quotes, anecdotes, pictures, video clips
• Depending on the design team, quantitative data may or may not be compelling