cmsc434 week 13 | lecture 25 | nov 26, 2013 evaluation ii · week 13 | lecture 25 | nov 26, 2013...

Human Computer Interaction Laboratory

@jonfroehlich Assistant Professor Computer Science

CMSC434 Introduction to Human-Computer Interaction

Week 13 | Lecture 25 | Nov 26, 2013

Evaluation II

WORK WITH

Professor Jordan Graber

Quiz Bowls!

Robots!

HCI!

Research!

Oh My!

[email protected]

Hall of Fame Hall of Shame

Source: http://en.flossmanuals.net/firefox/ch036_firefox-security-features/

Today

1. Jordan

2. Schedule

3. Evaluation II

4. In-Class Activity (if time)

5. Give back quizzes

Genres of Assessment

[Nielsen, J., and Molich, R. (1990). Heuristic evaluation of user interfaces, CHI'90;

Nielsen, J. (1994). Heuristic evaluation. In Nielsen, J., and Mack, R.L. (Eds.), Usability Inspection Methods;

http://www.useit.com/papers/heuristic/]

Inspection-Based Methods

Based on the skills and

experience of evaluators











These are sometimes also

called “Expert Reviews”








Automated Methods

Usability measures computed

by software











Automated Methods


by software

Formal Methods

Models and formulas to

calculate and predict

measures semi-automatically











Automated Methods


by software

Formal Methods




Empirical Methods

Evaluation assessed by

testing with real users










experience of evaluators:

Automated Methods


by software

Formal Methods




Empirical Methods






1. Heuristic Evaluation

2. Walkthroughs

Discount Usability Techniques

Heuristic Evaluation


Nielsen, J. (1994). Heuristic evaluation. In Nielsen, J., and Mack, R.L. (Eds.), Usability Inspection Methods.]

Heuristic evaluation involves having a small set of

evaluators examine the interface and judge its

compliance with recognized usability principles

(the "heuristics").

Heuristic evaluation involves having a small set of

evaluators examine the interface and judge its

compliance with recognized usability principles

(the "heuristics").

JakobNielsen, Ph.D. "The Guru of Web Page Usability" (NYT)

Inventor of Heuristic Evaluation

Nielsen’s 10 Heuristics




1. Visibility of

System Status System should

always keep users

informed, through

appropriate

feedback at

reasonable times.

2. Match System

& Real World The system should

speak the user’s

language, with

familiar words.

Information should

appear in natural

and logical order.

3. User Control

& Freedom Users often choose

functions by

mistake and need a

clearly marked

“emergency exit.”

Support undo and

redo.

4. Consistency

& Standards. Users should not

have to wonder

whether different

words/actions mean

the same thing.

Follow platform

conventions.

5. Error

Prevention Even better than

good error

messages is a

careful design that

prevents the

problem in the 1st

place.

6. Recognition

Over Recall Minimize the user’s

memory load by

making/actions,

options visible. The

user shouldn’t have

to remember from

one dialog to next.

7. Flexibility &

Efficiency Accelerators

(unseen by novice

users) often speed

up interaction for

expert users. Allow

users to tailor

frequent actions.

8. Aesthetic &

Minimalism Interfaces shouldn’t

contain irrelevant

information. Every

unit of info comp-

etes for attention &

diminishes relative

visibility.

9. Help Users

Recognize,

Diagnose, &

Recover from

Errors. Error msgs in plain

language, precisely

indicate problem,

suggest solution.

10. Help &

Documentation Best to not need

documentation but

when necessary,

should be easy to

search, focused on

user tasks, and list

concrete steps.

Densest Slide of Year Award!

Phases of Heuristic Evaluation 1. Pre-evaluation training: Give evaluators needed domain

knowledge & information on the scenario

2. Evaluation: For ~1-2 hours, independently inspect the

product using heuristics for guidance. Each expert should

take more than one pass through the interface.

3. Severity rating: Determine how severe each problem is

4. Aggregation: Group meets & aggregates problems (with

ratings)

5. Debriefing: Discuss the outcome with design team

27 [Slide from Professor Leah Findlater]

Severity Ratings

0 – don't agree that this is a usability problem

1 - cosmetic problem

2 - minor usability problem

3 - major usability problem; important to fix

4 - usability catastrophe; imperative to fix

28

[H4 Consistency] [Severity 3] [Fix 0]

The interface used the string "Save" on the first screen for saving the

user's file, but used the string "Write file" on the second screen. Users

may be confused by this different terminology for the same function.

(fairly severe, but easy to fix)

[Slide from Professor Leah Findlater]

How Many Evaluators?




In principle, individual evaluators can perform a heuristic

evaluation of a user interface on their own but…







Usability Problem (ordered from easiest to

find to hardest to find)

Hard to Find

Usability Problem

Easy to Find

Usability Problem

10. Help &


documentation but

when necessary,

should be easy to

search, focused on


concrete steps.




Each row

represents a

usability problem









Hard to Find

Usability Problem

Evaluator (ordered from least successful

evaluator to most successful) Each column is

an individual

evaluator

10. Help &


documentation but

when necessary,

should be easy to

search, focused on


concrete steps.

Each row

represents a

usability problem




Easy to Find

Usability Problem









Hard to Find

Usability Problem

Evaluator (ordered from least successful

evaluator to most successful)

“Worst” evaluator only found

3 usability problems (and they

were the easiest to find)

10. Help &


documentation but

when necessary,

should be easy to

search, focused on


concrete steps.




Easy to Find

Usability Problem

Automated Methods


by software

“Best” evaluator found 10

usability problems (but not

the two “hardest”)

Empirical Methods







Well, then, how many evaluators should we use?





Well, then, how many evaluators should we use?

Nielsen recommends ~5

evaluators (at least 3), which

balances cost/benefit.

Single evaluators

found, on average,

~35% of usability

problems.

Heuristic Evaluation Critiques

Shortly after heuristic evaluation was developed, several

independent studies compared heuristic evaluation

with other methods (e.g., user testing.) They found that

different approaches identified different problems; some-

times heuristic evaluation missed severe problems.

[Rogers et al., Interaction Design, Chapter 15, 2011]


Another problem concerns experts reporting problems

that don’t exist.



Another problem concerns experts reporting problems

that don’t exist. A study by Bailey (2001) found that

33% of problems were real usability problems; 21% of

problems were missed; and 43% of problems identified

by experts were not problems at all.



[Said at UPA2009 panel as quoted by Jeff Sauro: http://www.measuringusability.com/blog/he.php]

Heuristic evaluations are 99% bad.

RolfMolich Co-Inventor of Heuristic Evaluation



Rogers et al., Interaction Design, Chapter 15, 2011]

Heuristic Evaluation asdf

o Tends to uncover many low

severity problems; severe

problems can be missed

o Can be expensive and

difficult to find 3-5 usability

professionals (sometimes

more are needed!)

o Sometimes experts are

wrong

o No special facilities needed

o No participants required;

no user testing

o Is quick and dirty (a

discount usability method)


Heuristic Evaluation Heuristic Evaluation







experience of evaluators:

Automated Methods


by software

Formal Methods




Empirical Methods






1. Heuristic Evaluation

2. Walkthroughs

Walkthroughs

Walkthroughs are an alternative approach to heuristic

evaluation for predicting users’ problems without

doing user testing. They involve walking through a

task with an interface/product and noting problematic

usability features.


Cognitive Walkthroughs

[Rogers et al., Interaction Design, Chapter 15, 2011; http://en.wikipedia.org/wiki/Cognitive_walkthrough]

One type of walkthrough that involves simulating a

user’s problem-solving process at each step of

interaction with an interface.

Whereas heuristic evaluation takes a holistic view to

catch problems, cognitive walkthroughs are task

specific.

Cognitive Walkthroughs

The defining feature of [cognitive walkthroughs] is that

they focus on evaluating designs for ease of

learning—a focus that is motivated by observations that

users learn by exploration.


Pre-study Step: characteristics of typical users are

identified; sample tasks are created; a clear sequence of the

actions needed to accomplish task are documented

Walkthrough Step: Designer and one or more evaluators

come together to perform analysis; evaluators walk through

each step and try to answer these questions:

1

2

Performing Cognitive Walkthroughs








1. Will the user know what to do to achieve the task?

2. Will the user notice that the correct action is available?

3. Will the user interpret the response from action correctly?

1

2









Information Recording: As the walkthrough occurs, critical

information is compiled about: assumptions, problems, etc.

Design Revision: The recorded information is analyzed,

design improvement suggestions are made, and design is

iterated upon

1

2



3

4



Rogers et al., Interaction Design, Chapter 15, 2011]

Heuristic Evaluation asdf

o Time-consuming and

laborious

o Evaluators do not always

have a good understanding

of users

o Only a limited number of

tasks/scenarios can be

explored

o Strong focus on tasks

o Compared with HE, more

detail on moving through

an interaction w/system

o Perhaps most useful for

applications involving

complex operations


Walkthroughs Walkthroughs








Automated Methods


by software

Formal Methods




Empirical Methods













Automated Methods


by software

Formal Methods




Empirical Methods






1. GOMS Model

2. Keystroke Level Model (KLM)

Formal Methods

Similar to inspection methods and analytics, predictive

models (formal methods) evaluate a system without

users being present.


Formal Methods

Similar to inspection methods and analytics, predictive

models (formal methods) evaluate a system without

users being present. Rather than involving expert

evaluators or tracking usage, predictive models use

formulas to derive various measures of performance.









Automated Methods


by software

Formal Methods




Empirical Methods






1. GOMS Model


GOMS Model

A GOMS model, as proposed by Card, Moran, and

Newell (1983), is a description of the knowledge that a

user must have in order to carry out tasks on a device

or system; it is a representation of the "how to do it"

knowledge that is required by a system in order to get

the intended tasks accomplished.

[Kieras, A Guide to GOMS Analysis, 1994; Card et al., The Psychology of Human-Computer Interaction, 1983]

GOMS Model

[Rogers et al., Interaction Design, Chapter 15, 2011; Card et al., The Psychology of HCI, 1986]

An attempt to model the

knowledge and cognitive

processes involved when a

user interacts with a system

1

2

3

4

Goals refers to a particular state the

user wants to achieve

Operators refers to the cognitive

processes and physical actions that

need to be performed to achieve those

goals

Methods are learned procedures for

accomplishing the goals

Selection rules are used to determine

which method to select when there is

more than one available.

GOMS Model Example 1

1

2

Recall that word to be deleted has to be highlighted

Recall that command is ‘cut’

Recall that command ‘cut’ is in edit menu

Accomplish goal of selecting and executing the ‘cut’ command

Return with goal accomplished

Goal: Delete a word in a sentence in Microsoft Word

Method 1: Using menus

3

4

5


1

2

Recall where to position cursor in relation to word to be deleted

Recall which key is backspace key

Press backspace key to delete each letter


Method 2: Using backspace key

3

4

Top 5 ugliest slide of the

year

GOMS Model Example 1

1

2

Recall that word to be deleted has to be highlighted

Recall that command is ‘cut’

Recall that command ‘cut’ is in edit menu

Accomplish goal of selecting and executing the ‘cut’ command


Goal: Delete a word in a sentence in Microsoft Word

Method 1: Using menus

3

4

5


1

2

Recall where to position cursor in relation to word to be deleted

Recall which key is backspace key

Press backspace key to delete each letter


Method 2: Using backspace key

3

4

Operators

Click mouse Drag cursor over text Select menu Move cursor to command Press key Selection

1. Delete text using mouse if large amount of text is to be deleted.

2. Delete using backspace for small amount of text

Top 2 ugliest slide of the

year

GOMS Model Example 2 1

2

Goal: find a website about GOMS

Operators: Decide to use search

engine, decide which search engine to

use,

GOMS Model Example 2 1

2

3

4

Goal: find a website about GOMS

Operators: Decide to use search

engine, decide which search engine to

use, think up and enter keywords.

Methods: I know I have to type in

search terms and then press the search

button.

Selection: Do I use the mouse button

or hit the enter key?

GOMS Model

The goal of this work [GOMS modeling] is to radically

reduce the time and cost of designing usable systems

through developing analytic engineering models for

usability based on validated computational models of

human cognition and performance.

[Kieras, GOMS Models: An Approach to Rapid Usability Evaluation, http://web.eecs.umich.edu/~kieras/goms.html]

DavidKieras Professor in EECS and Psychology at the University of Michigan

GOMS Advocate

GOMS Model

GOMS is such a formalized representation that it can be

used to predict task performance well enough

that a GOMS model can be used as a substitute for

much (but not all) of the empirical user testing needed

to arrive at a system design that is both functional and

usable.



GOMS Advocate

GOMS Model

GOMS is such a formalized representation that it can be

used to predict task performance well enough

that a GOMS model can be used as a substitute for

much (but not all) of the empirical user testing

needed to arrive at a system design that is both

functional and usable.



GOMS Advocate








Automated Methods


by software

Formal Methods




Empirical Methods






1. GOMS Model


KLM (Keystroke Level Model)

The KLM (Keystroke Level Model) differs from the

GOMS model in that it provides numerical predictions

for performance. Tasks can be compared in terms of

the [expected] time it takes to perform them when using

different strategies.



The KLM (Keystroke Level Model) differs from the

GOMS model in that it provides numerical predictions

for performance. Tasks can be compared in terms of

the [expected] time it takes to perform them when using

different strategies.


For Example

Converting Temperature

[Raskin, J., The Humane Interface, Chapter 4, 2000]

How long will it take the user to

complete a conversion task?

How could we find out?

Let’s imagine we need to

design an efficient interface

for converting temperatures

(e.g., from F to C)

Experiment Or…

Experiment Model

Design and sketch a temperature converter

interface for converting Fahrenheit to Celsius

and Celsius to Fahrenheit.

In-Class Activity Part 1

[Raskin, J., The Humane Interface, Chapter 4, 2000; Based, n part, on activity from Professor Bederson at UMD]

1. Break into groups of 2-3

2. Spend ~5 minutes coming up with an interface to

convert a temperature to Fahrenheit or Celsius

3. Be prepared to discuss the thought process you used in

your design

4. Analyze your design in terms of how long you think it

will take a user to use your interface

In-Class Activity Part 2

[Raskin, J., The Humane Interface, Chapter 4, 2000; Based on activity from Professor Bederson at UMD]

1. In your same groups of 2-3

2. Spend ~5 minutes coming up with a model for how long it will

take to convert 92.5F to Celsius.

3. How does the above interface compare to your design?

Which is faster?

4. Note:

i. Dialog box is top level window and has focus (so typing goes directly

into the textbox)

ii. You must press enter to see result

How did we do?

What strategies did you use?

How did you “model” the task?

How accurate is your model?

How could we check it?

In-Class Activity


[Rogers et al., Interaction Design, Chapter 15, 2011; Card et al., The Psychology of Human-Computer Interaction, 1983]

When developing the KLM, Card et al. (1983) analyzed

the findings of many empirical studies of user

performance in order to derive a standard set of

approximate times for the main kinds of operators

used during a task (e.g., key presses, mouse clicks)

Proposed KLM Times


Operator Description Time (sec)

K Pressing a single key or button

Average skilled typist (55 wpm) 0.22

Average non-skilled typist (40

wpm) 0.28

Pressing shift or control key 0.08

Typist unfamiliar with the

keyboard 1.2

P

Pointing with a mouse or other

device on a display to select an

object. 0.4

This value is derived from Fitts’ Law which is discussed below.

Clicking the mouse or similar device

P1 0.2

H

Bring ‘home’ hands on the

keyboard or other device 0.4

M Mentally prepare/respond 1.35

R(t)

The response time is counted

only if it causes the user to wait. t

Insert chart from page

523 of Rogers

Interaction Design

Operator Description Time(s)

K Pressing a single key of button Skilled typist (55 wpm) Average typist (44 wpm) User unfamiliar with keyboard Pressing shift or control key

0.35 (avg) 0.22 0.28 1.20 0.08

P

P1

Pointing w/a mouse or other device to a target on the display Clicking the mouse or similar device

1.10 0.20

H Homing hands on the keyboard or other device 0.40

D Draw a line using a mouse Depends on length of line

M Mentally prepare to do something 1.35

R(t) System response time—counted only if it causes the user to wait when carrying out his/her task

t

Proposed KLM Times



523 of Rogers

Interaction Design



0.35 (avg) 0.22 0.28 1.20 0.08

P

P1


1.10 0.20





t

The wide variability of each measure explains why we

cannot use this simplified model to obtain absolute

timings with any degree of certainty; by using typical

values, however, we usually obtain the correct ranking

of the performance times of two interface designs.

-Jef Raskin, The Humane Interface, 2000, p74.

Proposed KLM Times






wpm) 0.28



keyboard 1.2

P



object. 0.4



P1 0.2

H




R(t)




523 of Rogers

Interaction Design



0.35 (avg) 0.22 0.28 1.20 0.08

P

P1


1.10 0.20





t

Proposed KLM Times






wpm) 0.28



keyboard 1.2

P



object. 0.4



P1 0.2

H




R(t)





0.35 (avg) 0.22 0.28 1.20 0.08

P

P1


1.10 0.20





t

Performing KLM






wpm) 0.28



keyboard 1.2

P



object. 0.4



P1 0.2

H




R(t)



Texecuted = TK + TP + TH + TD + TM + TR

The predicted time it takes to execute a task is then a sum of the

performance times of each operator used

Applying KLM to our Example

[Card et al., The Psychology of Human-Computer Interaction, 1983; Raskin, J. The Humane Interface, 2000]

Task: How long will it take to convert 92.5F to Celsius

The answer is: MKKKK (2.15s) HMPKMPKHMKKKK (8.25s) => Average: ~5s

1

2

3

4

Move hand to the graphical input device:

H

Point to the textbox:

HP

Click on the textbox:

HPP1

Move hands back to the keyboard:

HPP1H

Type the four characters (“92.5”): HPP1HKKKK

Tap Enter: HPP1HKKKKK

Convert to time:

0.4 + 1.1 + 0.2 + 0.4 + (0.28 * 5) = 3.5s

5

6

7

Applying KLM to our Example Task: How long will it take to convert 92.5F to Celsius


1

2

3

4


H


HP


HPP1


HPP1H



Convert to time:

0.4 + 1.1 + 0.2 + 0.4 + (0.28 * 5) = 3.5s

5

6

7


Heuristics for Placing M Operators

[Raskin, J. The Humane Interface, 2000, Chapter 4]

Inserting Mental Operators




1

2

3

4


H


HP


HPP1


HPP1H



Convert to time:

0.4 + 1.1 + 0.2 + 0.4 + (0.28 * 5) = 3.5s

5

6

7

Inserting Mental Operators





H


HP


HPP1


HPP1H



Apply mental operators using Raskin’s heuristics:

HMPP1HMKKKKMK

Convert to time:

0.4 + 1.35 + 1.1 + 0.2 + 0.4 + 1.35 + (0.28 * 4) + 1.35 + 0.28 = 7.55s

1

2

3

4

5

6

7

8

KLM to Inform Design

[Raskin, J. The Humane Interface, 2000, Chapter 4]

Which is a better design?

A more efficient interface is possible by

taking advantage of character-at-a-time

interaction and by performing both

conversions at once…

Perhaps, however, the cognitive load to use

this interface is higher. How about

learnability?

Adapting KLM

Researchers wanting to use the KLM to predict the

efficiency of key & button layout on devices have adapted

it to meet the needs of these new products. For example,

today, mobile device and phone developers are using

KLM to determine the optimal design for keypads.


[Holleis et al., Keystroke-Level Model for Advanced Mobile Phone Interaction, CHI2007]

o Not as easy as HE and Cognitive

Walkthroughs

o Limited scope: can only model

interactions that involve a small

set of highly routine data-entry

type tasks

o Intended to be used only to

predict expert performance

o Does not model errors, which

can substantially impact

performance

o Does not capture readability,

learnability, aesthetic, etc.

o Main benefit: can comparatively

analyze different interfaces /

prototypes easily

o No reliance on users!

o Easy to rerun on iterated

interfaces

o A number of researchers

reported its success for

comparing efficacy


GOMS and KLM GOMS and KLM

cmsc434 week 13 | lecture 25 | nov 26, 2013 evaluation ii · week 13 | lecture 25 | nov 26, 2013...

Documents