cs 350 computer/human...

CS 350 COMPUTER/HUMAN

INTERACTION

Lecture 23

Includes selected slides from the companion website for Hartson & Pyla, The UX Book, 2012. ©MKP, All

rights reserved. Used with permission.

Notes

■ Swapping project work days and class

days for rest of term. I.e., work days on

Tuesdays; class days on Thursdays.

■ Mid-project progress report due date

extended to Thursday next week (April 12)

2April 5, 2018 CS 350 Computer/Human Interaction - Lecture 23

Outline

■ Chapter 12 - UX Evaluation Introduction

– Formative vs. summative evaluation

– Rigorous vs. rapid UX evaluation methods

– Empirical vs. analytic methods

– Data collection techniques

■ Chapter 13 – Rapid evaluation methods

– Design walkthoughs and reviews

– UX inspection

– Heuristic evaluation

– Quasi-empirical methods


Introduction: Evaluation

4CS 350 Computer/Human Interaction - Lecture 23April 5, 2018

Introduction

■ User Testing? No!

■ Users don't like to be tested

■ Instead: user-based design (or UX)

evaluation


Formative vs. summative evaluation

■ Formative evaluation helps you form

design

■ Summative evaluation helps you sum up

design

■ “When the cook tastes the soup, that’s

formative”

■ “When the guests taste the soup, that’s

summative”


Formative evaluation

■ Diagnostic nature

■ Uses qualitative data

■ Immediate goal: To identify UX problems

and their causes in design

■ Ultimate goal: To fix the problems


Summative evaluation

■ Collecting quantitative data

– To assess level of user experience quality

due to a design

– Especially for assessing improvement in user

experience due to iteration of

■ Formative evaluation

■ Re-design


Formal summative evaluation

■ Comparative benchmark study based on

rigorous experimental design aimed at

comparing designs

■ Controlled experiment, hypothesis testing

– Example, with m by n factorial design, y

independent variables

■ Results subjected to statistical tests for

significance


Formal summative evaluation

■ Contributes to our science base

■ The only way you can make public claims

based on your results

■ An important HCI skill, but not covered in

this course


Informal summative evaluation

■ Partner of formative evaluation

– Example, measure time on task

– For engineering summing up or assessing of

UX levels

■ Done without experimental controls



■ Usually without validity concerns, such as

in sampling, degree of confidence

■ Usually with small number of participants

■ Only summary statistics (e.g., mean and

variance)



■ Uses metrics for user performance

– As indicators of user performance

– As indicators of design quality

■ Metrics in comparison with pre-

established UX target levels

(Chapter 10)



■ Results not validated

– Can be used only to guide engineering

development process

– Cannot make any claims based on your

result to your organization or to public

■ An important ethical constraint


Engineering evaluation of UX

■ Formative plus informal summative


Types of UX evaluation methods

■ Orthogonal dimensions for classifying

types

– Rigorous method vs. rapid method

– Empirical method vs. analytic method


Rigorous UX evaluation methods

■ Use full process

– Preparation, data collection, data analysis,

and reporting

– Chapters 12 and 14 through 18

– Use no shortcuts or abridgements

■ Certainly not perfect

– But is yardstick by which other evaluation

methods are compared


Choose a rigorous empirical method

■ When you need maximum effectiveness and thoroughness– But expect it to be more expensive and time

consuming

■ When you need to manage risk carefully

■ To assess quantitative UX measures and metrics – E.g., time-on-task and error rates

– As indications of how well user does in performance-oriented context


Rapid UX evaluation methods

■ Choose a rapid evaluation method

– For speed and cost savings

■ But expect it to be (possibly acceptably) less effective

– For early stages of progress

■ When things are changing a lot, anyway

■ When investing in detailed evaluation is not warranted

■ Choose a rapid method for initial reactions and early feedback

– Design walkthrough

– Informal demonstration of design concepts


Empirical method vs. analytic method

■ Another dimension for classifying types

■ Empirical methods

– Employ data observed in performance of real

user participants

– Usually data collected in lab-based testing


Empirical method vs. analytic method

■ Analytical methods

– Based on looking at inherent attributes of

design

– Rather than seeing design in use

■ Many rapid UX evaluation methods are

analytic

– Example, design walkthroughs, UX inspection

methods


Hybrid methods - analytical and empirical

■ Often in practice, methods are a mix

■ Example, expert UX inspection

– Can involve “simulated empirical” aspects

– Expert plays role of user

– Simultaneously performing tasks

– “Observing” UX problems, but much of it is

analytical


Where the dimensions intersect


Formative data collection techniques

■ Critical incident identification

■ Think-aloud technique

■ Both used in rigorous and rapid methods


Critical incident identification

■ A critical incident is an event observed

within task performance

– Significant indicator of UX problem

– Due to effects of design flaws on users

■ Arguably single most important source of

qualitative data in formative evaluation

■ Can be difficult until you learn to do it


Critical incident identification

■ Critical incident data

– Detailed and perishable

■ Must be captured immediately and precisely as

they arise during usage

– Essential for isolating specific UX problems

■ That is why alpha and beta testing might

not be as effective for formative

evaluation


Think-aloud technique

■ Participants let us in on their thinking

– Their intentions

– Rationale

– Perceptions of UX problems

■ User participants verbally express their

thoughts during interaction experience

■ Also called “think-aloud protocol” or

“verbal protocol”



■ Very effective qualitative data collection technique

■ Technique is simple to use, for both analyst and participant

■ Useful for walk-through of prototype

■ Effective when participant helps with inspection

■ Good for assessing internally felt emotional impact



■ Needed when

– User hesitates

– A real UX problem is hidden from

observation

■ Sometimes you have to remind

participants to verbalize


Questionnaires

■ A self-reporting data collection technique

■ Primary instrument for collecting

quantitative subjective data

■ Used to supplement objective data

■ An evaluation method on its own


Questionnaires

■ In past, have been used primarily to assess user satisfaction

– But can contain probing questions about total user experience

– Especially good for emotional impact, perceived usefulness

■ Inexpensive and easy to administer

■ But require skill to produce so that data are valid and reliable


Semantic differential scales

■ Also called Likert scales

■ Each question posed on range of values describing attribute

■ Most extreme value in each direction on scale is an anchor

■ Scale divided with points between anchors– Divide up difference between anchor

meanings


Semantic differential scales

■ Granularity of the scale

– Number of discrete points (choices),

including anchors, we allow users

■ Typical labeling of a point on a scale is

verbal

– Often with associated numeric value

– Labels can also be pictorial

■ Example, smiley faces

■ Helps make it language-independent


Example: semantic differential scale

■ To assess participant agreement with this statement– “The checkout process on this Website was easy

to use.”

■ Might have these anchors: Strongly agree and strongly disagree

■ In between scale might include: Agree, neutral, disagree

■ Could have associated values of +2, +1, 0, -1, and -2


System Usability Scale (SUS)

■ Just 10 questions

■ Alternates positive and negative questions

– Prevents answers without really considering

the questions

■ Five-point Likert scale


Example: SUS questions

1. I think that I would like to use this system frequently

2. I found the system unnecessarily complex

3. I thought the system was easy to use

4. I would need technical support to be able to use this

system

5. I found functions in this system integrated


Example: SUS questions

6. I think there is too much inconsistency in this system

7. I would imagine that most people would learn to use

this system very quickly

8. I found system very cumbersome to use

9. I felt very confident using the system

10. I needed to learn a lot of things before I could get

going


System Usability Scale (SUS)

■ Robust, extensively used

■ Widely adapted

■ In public domain

■ Technology independent


Adapting questionnaires

■ You can modify an existing questionnaire

– Choosing a subset of questions

– Changing the wording in some questions

– Adding questions to address specific areas of concern

– Using different scale values

■ Warning: Modifying a questionnaire can

damage its validity


Evaluating emotional impact

■ Data collection techniques especially for

emotional impact

■ Can be “measured” indirectly in terms of

its indicators

■ “Emotion is a multifaceted phenomenon”

– Expressed through feelings

– Verbal and non-verbal languages

– Facial expressions and other behaviors


Evaluating emotional impact

■ Emotional impact indicators

– Self-reported via verbal techniques

– Physiological responses observed

– Physiological responses measured


Self reporting of emotional impact

■ Most emotional impact involving

aesthetics, emotional values, and simple

joy of use

– Felt by user

– But not necessarily observed by evaluator

■ Self reporting can tap into these feelings


Self reporting of emotional impact

■ Concurrent self reporting

– Participants comment via think-aloud

techniques on feelings and their causes in

the user experience

■ Retrospective self-reporting

– Questionnaires (see AttrakDiff in textbook)


Observing physiological responses

■ Self-reporting can be biased

– Human users cannot always access own

emotions

■ So observe physiological responses to

emotional impact encounters


Observing physiological responses

■ Emotional “tells” of facial and bodily

expressions can be

– Fleeting, subliminal

– Easily missed in real-time observation

■ To capture reliably

– Might make video recordings

– Do frame-by-frame analysis


Bio-metrics

■ Instruments to detect and measure

physiological responses

– Measure autonomic or involuntary bodily

changes

– Triggered by nervous system responses

– To emotional impact within interaction

events


Bio-metrics

■ Changes in perspiration measured by

galvanic skin response measurements

– Detects changes in electrical conductivity

■ Pupillary dilation is autonomous indication

of

– Interest, engagement, excitement

■ Downside of biometrics is need for

specialized monitoring equipment


Evaluating phenomenological aspects of interaction

■ Phenomenological aspects of interaction involve emotional impact over time

– Not snapshots of usage

– Not about tasks but about human activities

– Users invite product into their lives

– Give it a presence in daily activities

■ Example, how someone uses a smartphone in their life


Evaluating phenomenological aspects of interaction

■ Users build perceptions and judgment

through exploration and learning

– As usage expands and emerges

■ Data collection techniques for

phenomenological aspects

– Have to be longitudinal


Need for self-reporting

■ Self-reporting techniques often necessary

■ Not as objective as direct observation

– But a practical solution


Introduction: Rapid UX Evaluation


Rapid evaluation techniques

■ Aimed almost exclusively at collecting

qualitative data

– Finding UX problems to fix

■ Seldom, if ever, includes quantitative

measurements

■ Heavy dependency on practical

techniques


Rapid evaluation techniques

■ Everything less formal

– Less protocol and fewer rules

■ Much more variability in process

– Almost every evaluation “session” different

– Tailored to prevailing conditions

■ This flexibility means more spontaneous

ingenuity

– Something experienced practitioners do best


Design walk-throughs and reviews

■ Early stages of a project

■ Have only

– Your conceptual design

– Scenarios, storyboards

– Maybe some screen sketches or wireframes

■ Not enough for interacting with customers

or users


Design walkthrough

■ Easy and quick evaluation method

■ Can be used at almost any stage

■ Especially effective early, before prototype exists

■ Audience can include– Design team, UX analysts– Subject-matter experts, customer

representatives– Potential users


Design walkthrough

■ Goal is to explore design on behalf of

users

■ No interaction, so you (evaluators on the

design team) do the driving

■ Leader tells stories about users and

usage, intentions and actions, and

expected outcomes.


Rapid evaluation beyond early stages

■ Uses interactive prototype

– Including paper prototypes

■ Most of rapid evaluation techniques are

variations of

– Inspection techniques

– Quasi-empirical testing


UX inspection

■ Especially good for early stages and early

design iterations

■ Appropriate for existing system that has

not undergone previous evaluation

■ For when you cannot afford or cannot do

lab-based testing


UX inspection

■ Also called “expert evaluation” or “expert

inspection or “heuristic evaluation (HE)”

■ But heuristic evaluation is actually one

specific kind of inspection (Nielsen)


UX inspection

■ Reminder: Cannot “inspect the user

experience”

■ But inspect design for user experience

issues

■ An analytical evaluation method

■ The primary rapid evaluation technique


Heuristic evaluation

■ Is one kind of UX inspection method

■ A heuristic is a simplified, abstracted

design guideline

■ Drive inspection with small number (about

10) of heuristics



■ Example heuristic: “Visibility of System

Status”The system should always keep users informed

about what is going on through appropriate

feedback within reasonable time.

April 5, 2018 CS 350 Computer/Human Interaction - Lecture 23 62


■ Another example heuristic: “Match

Between System and The Real World”The system should speak the users’ language, with

words, phrases, and concepts familiar to the user

rather than system-oriented terms. Follow real-world

conventions, making information appear in a natural

and logical order.

■ Full listing of heuristics in book, link on

course webpage


Emotional impact inspection

■ Look for fun, aesthetics, innovation,

■ Include packaging and out-of-the-box

experience

■ Try to envision long-term experience


The RITE UX Evaluation Method

■ Rapid Iterative Testing and Evaluation

(Wixon et al.)

■ A quasi-empirical method

■ A kind of abridged version of user-based

testing

■ Fast collaborative test-and-fix cycle

– Pick low-hanging fruit

– Relatively low cost


Quasi-empirical methods

■ No formal predefined “benchmark tasks”

■ For tasks, draw on

– Usage scenarios

– Essential use cases, step-by-step task

interaction models



■ Cut corners as much as possible

■ No quantitative data collected

– Single paramount mission is to identify UX problems that can be fixed efficiently

■ Forget controlled conditions

– Interrupt and intervene at opportune moments

■ Elicit thinking aloud

■ Ask for explanations and specifics



■ Defined by freedom given to practitioners:

– To innovate, to make it up as they go

– To be flexible about goals and approaches

– To make impromptu changes of pace,

direction, focus


cs 350 computer/human...

Documents