week 10 ux goals and metrics workshop

Workshop #2

UX Goals and Metrics

Human Computer Interaction / COG3103, 2014 Fall Class hours : Tue 1-3 pm/Thurs 12-1 pm 6 November

Choosing the Right Metrics Ten Types of Usability Studies

• Issue Based Metrics (Ch 5)

– Anything that prevents task completion

– Anything that takes someone off course

– Anything that creates some level of confusion

– Anything that produces an error

– Not seeing something that should be noticed

– Assuming something should be correct when it is not

– Assuming a task is complete when it is not

– Performing the wrong action

– Misinterpreting some piece of content

– Not understanding the navigation

Workshop #2 COG_Human Computer Interaction 2

Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics

Self Reported Metrics

Behavioral and Physiological Metrics

Combined and Comparative Metrics

Live Website Metrics

Card Sorting Data


• Self Reported Metrics (Ch 6) : Asking participant for information about their

perception of the system and their interaction with it

– Overall interaction

– Ease of use

– Effectiveness of navigation

– Awareness of certain features

– Clarity of terminology

– Visual appeal

– Likert scales

– Semantic differential scales

– After-scenario questionnaire

– Expectation measures

– Usability Magnitude Estimation

– SUS

– CUSQ (Computer System Usability Scale)

– QUIS (Questionnaire for User Interface Satisfaction)

– WAMMI (Website Analysis & Measurement Inventory)

– Product Reaction Cards


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Behavioral and Physiological Metrics (Ch 7)

– Verbal Behaviors

• Strongly positive comment

• Strongly negative comment

• Suggestion for improvement

• Question

• Variation from expectation

• Stated confusion/frustration

– Nonverbal Behaviors

• Frowning/Grimacing/Unhappy

• Smiling/Laughing/Happy

• Surprised/Unexpected

• Furrowed brow/Concentration

• Evidence of impatience

• Leaning in close to screen

• Fidgeting in chair

• Rubbing head/eyes/neck


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Combined and Comparative Metrics (Ch

8)

– Taking smaller pieces of raw data like

task completion rates, time-on-task, self

reported ease of use to derive new

metrics such as an overall usability

metric or usability score card

– Comparing existing usability data to

expert or idea results


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Live Website Metrics (Ch 9)

– Information you can glean from live data

on a production website

• Server logs – page views and visits

• Click through rates - # times link shown vs.

actually clicked

• Drop off rates – abandoned process

• A/B studies – manipulate the pages users

see and compare metrics between them


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Card Sorting Data (Ch 9)

– Open card sort

• Give participants cards, they sort and

define groups

– Closed card sort

• Give participants cards and name of

groups, they put cards into groups


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Increasing Awareness

– Aimed at increasing awareness of a specific piece of content

or functionality

– Why is something not noticed or used?

• Metrics

– Live Website Metrics

• Monitor interactions

• Not foolproof – user may notice and decide not to click,

alternatively user may click but not notice interaction

• A/B testing to see how small changes impact user behavior

– Self Reported Metrics

• Pointing out specific elements to user and asking whether

they had noticed those elements during task

• Aware of feature before study began

– Not everyone has good memory

• Show users different elements and ask them to choose

which one they saw during task

– Behavioral and Physiological Metrics

• Eye tracking

– Determine amount of time looking at a certain element

– Average time spent looking at a certain element


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Problem Discovery

– Identify major usability issues

– After deployment, find out what annoys users

– Periodic checkup to see how users are interaction with

the product

• Discovery vs. usability study

– Open-ended

– Participants may generate own tasks

– Strive for realism in typical task and in user’s

environment

– Comparing across participants can be difficult

• Metrics

– Issue Based Metrics

• Capture all usability issues, you can convert into type

and frequency

• Assign severity rating and develop a quick-hit list of

design improvements



Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Creating an Overall Positive User Experience

– Not enough to be usable, want exceptional user

experience

– Thought provoking, entertaining, slightly-addictive

– Performance useful, but what user thinks, feels, and

says really matters

• Metrics

– Self Reported

• Satisfaction – common but not enough

• Exceed expectations – want user to say it was easier,

more efficient, or more entertaining than expected

• Likelihood to purchase, use in future

• Recommend to a friend

• Behavioral and Physiological

– Pupil diameter

– Heart rate

– Skin conductance


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Comparing Designs

– Comparing more than one design alternative

– Early in the design process teams put together semi-

functional prototypes

– Evaluate using predefined set of metrics

• Participants

– Can’t ask same participant to perform same tasks with

all designs

– Even with counterbalancing design and task order,

information on valuable

• Procedure

– Study as between-subjects, participant only works with

one design

– Have primary design participant works with, show

alternative designs and ask for preference


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Comparing Designs (continued)

• Metrics

– Task Success

• Indicates which design more usable

• Small sample size, limited value

– Task Time

• Indicates which design more usable

• Small sample size, limited value

– Issue Based Metrics

• Compare the frequency of high-, medium-, and

lowseverity issues across designs to see which one

most usable


• Ask participant to choose the prototype they would

most like to use in the future (forced comparison)

• As participant to rate each prototype along

dimensions such as ease of use and visual appeal


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data

Independent & Dependent Variables

Independent variables:

– The things you manipulate or

control for, e.g.,

– Aspect of a study that you

manipulate

– Chosen based on research question

– e.g.

• Characteristics of participants (e.g.,

age, sex, relevant experience)

• Different designs or prototypes

being tested

• Tasks

Dependent variables: – The things you measure

– Describes what happened as a result

of the study

– Something you measure as the result,

or as dependent on, how you

manipulate the independent variables

– e.g.

• Task Success

• Task Time

• SUS score

• etc.


Need to have a clear idea of what you plan to manipulate and what you plan to measure

Designing a Usability Study

RQ 1

• Research Question :

– Differences in performance

between males and females

• Independent variable

– : Gender

• Dependent variable

– : Task completion time

RQ 2

• Research Question :

– Differences in satisfaction

between novice and expert

users

• Independent variable :

– Experience level

• Dependent variable :

– Satisfaction


Types of Data

• Nominal (aka Categorical)

– e.g., Male, Female; Design A, Design B.

• Ordinal

– e.g., Rank ordering of 4 designs tested from Most Visually Appealing to

Least Visually Appealing.

• Interval

– e.g., 7-point scale of agreement: “This design is visually appealing.

Strongly Disagree . . . Strongly Agree”

• Ratio

– e.g., Time, Task Success %


NORMINAL DATA

• Definition

– Unordered groups or categories

– Without order, cannot say one is better than another

• May provide characteristics of users, independent variables that allow you to segment

data

– Windows versus Mac users

– Geographical location

– Males versus females

• What about dependent variables?

– Number of users who clicked on A vs. B

– Task success

• Usage

– Counts and frequencies


ORDINAL DATA

• Definition

– Ordered groups and categories

– Data is ordered in a certain way but intervals between measurements are not

meaningful

• Ordinal data comes from self-reported data on questionnaires

– Website rated as excellent, good, fair, or poor

– Severity rating of problem encountered as high, medium, or low

• Usage

– Looking at frequencies

– Calculating average is meaningless (distance between high and medium may

not be the same as medium and low)


INTERVAL DATA

• Definition

– Continous data where differences between the measurements are meaningful

– Zero point on the scale is arbitrary

• System Usability Scale (SUS)

– Example of interval data

– Based on self-reported data from a series of questions about overall usability

– Scores range from 0 to 100

• Higher score indicates better usability

• Distance between points meaningful because it indicates increase/decrease in percieved

usability

• Usage

– Able to calculate descriptive statistics such as average, standard deviation, etc.

– Inferal statistics can be used to generalize a population


Ordinal vs. Interval Rating Scales

• Are these two scales different?

• Top scale is ordinal. You should only calculate frequencies of each

response.

• Bottom scale can be considered interval. You can also calculate

means.


RATIO DATA

• Definition

– Same as interval data with the addition of absolute zero

– Zero has inherit meaning

• Example

– Difference between a person of 35 and a person 38 is the same as the

difference between people who are 12 and 15

– Time to completion, you can say that one participant is twice as fast as

another

• Usage

– Most analysis that you do work with ratio and interval data

– Geometric mean is an exception, need ratio data


Statistics for each Data Type


Confidence Intervals

• Assume this was your time data for a study with 5 participants:


Does that make a difference in your answer?

Calculating Confidence Intervals

– <alpha> is normally .05 (for a

95% confidence interval)

– <std dev> is the standard

deviation of the set of

numbers (9.6 in this example)

– <n> is how many numbers are

in the set (5 in this example)


=CONFIDENCE(<alpha>,<std dev>,<n>)

Excel Example

http://www.measuringux.com/Time-ConfidenceInterval.xls

Show Error Bars


Excel Example

http://www.measuringux.com/SUS-Apollo.xls

How to Show Error Bar


Binary Success

• Pass/fail (or other binary criteria)

• 1’s (success) and 0’s (failure)


Confidence Interval for Task Success

• When you look at task success data across participants for a single

task the data is commonly binary:

– Each participant either passed or failed on the task.

• In this situation, you need to calculate the confidence interval using

the binomial distribution.


Example

– Easiest way to calculate confidence interval is using Jeff Sauro’s

web calculator:

– http://www.measuringusability.com/wald.htm


1=success, 0=failure. So, 6/8 succeeded, or 75%.

http://www.measuringusability.com/wald.htm

Chi-square

• Allows you to compare actual and expected frequencies for

categorical data.


=CHITEST(<actual range>,<expected range>)

Excel Example

http://www.measuringux.com/Chi-square-ClickRates.xls

Comparing Means

T-test

• Independent samples

(between subjects)

– Apollo websites, task times

T-test

• Paired samples (within

subjects)

– Haptic mouse study


http://www.measuringux.com/Apollo-TaskTimes.xls

http://www.measuringux.com/HapticMouse-Paired-Ttest.xls

T-tests in Excel

Independent Samples: Paired Samples:


=TTEST(<array1>,<array2>,x,y)

x = 2 (for two-tailed test) in almost all cases

y = 2 (independent samples) y = 1 (paired samples)

Comparing Multiple Means

• Analysis of Variance (ANOVA)


“Tools” > “Data Analysis” > “Anova: Single Factor” Excel example: Study comparing 4 navigation approaches for a website

http://www.measuringux.com/NavStudy-ANOVA.xls





Exercise 10-2: Creating Benchmark Tasks and UX Targets for Your System

• Goal

– To gain experience in writing effective benchmark tasks and measurable UX targets.

• Activities

– We have shown you a rather complete set of examples of benchmark tasks and UX targets for the Ticket Kiosk

System. Your job is to do something similar for the system of your choice.

– Begin by identifying which work roles and user classes you are targeting in evaluation (brief description is

enough).

– Write three or more UX table entries (rows), including your choices for each column. Have at least two UX

targets based on a benchmark task and at least one based on a questionnaire.

– Create and write up a set of about three benchmark tasks to go with the UX targets in the table.

• Do NOT make the tasks too easy.

• Make tasks increasingly complex.

• Include some navigation.

• Create tasks that you can later “implement” in your low-fidelity rapid prototype.

• The expected average performance time for each task should be no more than about 3 minutes, just to keep it

short and simple for you during evaluation.

– Include the questionnaire question numbers in the measuring instrument column of the appropriate UX target.

Lecture #12 COG_Human Computer Interaction 33

Exercise 10-2: Creating Benchmark Tasks and UX Targets for Your System

• Cautions and hints:

– Do not spend any time on design in this exercise; there will be time for detailed design in

the next exercise.

– Do not plan to give users any training.

• Deliverables:

– Two user benchmark tasks, each on a separate sheet of paper.

– Three or more UX targets entered into a blank UX target table on your laptop or on paper.

– If you are doing this exercise in a classroom environment, finish up by reading your

benchmark tasks to the class for critique and discussion.

• Schedule

– Work efficiently and complete in about an hour and a half.

Lecture #12 COG_Human Computer Interaction 34

week 10 ux goals and metrics workshop

Education