quantitative evaluation john kelleher, it sligo. 1
Post on 19-Dec-2015
214 views
TRANSCRIPT
Quantitative Evaluation
John Kelleher, IT Sligo
2
3
Definition Methods
Performance/Predictive Modeling GOMS/KLM Fitts’ Law
Controlled Experiments & Statistical Analysis Without measurement, success is undefined Formal Usability Study
to compare two designs on measurable aspects time required number of errors effectiveness for achieving very specific tasks
4
GOMS Model Card, Moran & Newell (1983) Model the knowledge and cognitive processes involved when
users interact with systems. Goals
refer to particular state the user wants to achieve Operators
refer to the cognitive processes and physical actions that need to be performed in order to attain those goals
Methods are learned procedures for accomplishing the goals, consisting of
exact sequence of steps required Selection Rules
Are used to determine which method to select when there is more than one available for a given stage of a task.
5
GOMS: Example of deleting word in MS Word
Goal: delete a word in a sentence
Method for accomplishing goal of deleting a word using menu option:
Step 1: Recall that word to be deleted has to be highlightedStep 2: Recall that command is ‘cut’Step 3: Recall that command ‘cut’ is in edit menuStep 4: Accomplish goal of selecting and executing
the ‘cut’ commandStep 5: Return with goal accomplished
6
GOMS: Example of deleting word in MS Word
Method for accomplishing goal of deleting a word using delete key:
Step 1: Recall where to position cursor in relation to word to be deletedStep 2: Recall which key is delete keyStep 3: Press ‘delete’ key to delete each letterStep 4: Return with goal accomplished
7
GOMS: Example of deleting word in MS WordOperators to use in above methods:
Click mouseDrag cursor over textSelect menuMove cursor to commandPress keyboard key
Selection Rules to decide which method to use:1: Delete text using mouse and selecting from menu if large amount of text is to be deleted2: Delete text using delete key if small number of letters is to be deleted
8
Keystroke Level Model Well-known analytic evaluation technique Derived from MHP1
Provides detailed quantitative (numerical) information of user performance
Sufficient for predicting speed of interaction with a user interface
Basic time prediction components empirically derived
1 Model Human Processor by Card, Moran, Newell (1983)
9
KLM ConstantsOperator Name
Description Time (Sec)
K Pressing a single key or buttonSkilled typist (55 wpm)Average typist (40 wpm)User unfamiliar with the keyboardPressing shift or control key
0.35 (average)0.220.281.200.08
P Point with a mouse or other device to a target on a displayClicking the mouse or similar device
1.100.20
H Homing hands on the keyboard or other device 0.40
D Draw a line using a mouse Variable depending on the length of line
M Mentally prepare to do something (e.g. make a decision) 1.35
R(t) System response time – counted only if it causes the user to wait when carrying out their task
t
10
Task in Text Editor Using GOMS
Create new file Type in “Hello, World.” Save document as “Hello” Print document Exit editor
Assume system response is 0, or comparable across systems (constant) Average typist (55wpm) (K = 0.2) Editor is started, hands in lap
11
All Mouse
TASK KLM TIMEOpen New File H+P+B+P+B 2.8Type words H + 15 * K 3.4
Save H+P+B+P+B+H+5*K+K
4.4
Print H+P+B+P+B+P+BB
4.1
Exit P+B+P+BB 2.5TOTAL: 17.2 secs
12
Shortcuts
TASK KLM TimeOpen New File H+(2*K) 0.8Type words 15 * K 3.0
Save (2*K)+(5*K)+K 1.6
Print (2*K) + K 0.6Exit (2*K) + K 0.6Total: 6.6 secs
13
KLM Applicability
User interface w/ limited number of features Repetitive task execution Really only useful for comparative study among
alternatives albeit sensitive to minor changes Project Ernestine
Caveats assumes expert behaviour – no errors tolerated user already knows the sequence of operations that he or
she is going to perform time estimates best followed-up by empirical studies ambiguity regarding M operator assumes serial processing
14
Fitts’ Law
Predicts time taken to reach a target using a pointing device
T = k log2(D/S + 0.5), k ~ 100 msec.where
T = time to move the hand to a targetD = distance between hand and targetS = size of target
Highlights corners of screen as good targets
15
Performance measures
Time: easy to measure and suitable for statistical analysis. E.g. learning time, task completion time.
Errors: shows where problem exist within a system. Suggests the cause of a difficulty.
Patterns of system use: study the patterns of use in different sections. Preference and avoidance of sections in a system.
Amount of work done in a given time.
16
Other measures Subjective impression measures
Attitude measures: Use questionnaires or interviews
Rated aesthetics Rated ease of learning Stated decision to purchase
Composite measures Weighted averages of the above E.g. efficiency = throughput / number of errors
17
Designed to test predictions arising from an explicit hypothesis that arises out of an underlying theory
Allows comparison of systems, fine-tuning of details ... Strives for
lucid and testable hypothesis quantitative measurement measure of confidence in results obtained (statistics) replicability of experiment control of variables and conditions removal of experimenter bias
Controlled experiments
18
Ben Shneiderman (Univ. Maryland US)
Experiments have: Two Parents:
‘a practical problem’ ‘a theoretical foundation’
Three Children: ‘Help in resolving the practical problems’ ‘refinements to the theory’ ‘advice to future experimenters who work on the
same problem’
19
Designing Experiments
Formulating the hypotheses Developing predictions from the hypotheses Choosing a means to test the predictions Identifying all the variables that might affect
the results of the experiment Deciding which are the independent
variables, dependent variables and which variables need to be controlled by some means
20
Usability Laboratory
21
Usability Laboratory
22
Designing Experiments (contd.)
Designing the experimental task and method Subject selection Deciding the experimental design, data
collection method and controlling confounding variables
Deciding on the appropriate statistical or other analysis
Carrying out a pilot study
23
The Experimental Methoda) Begin with a lucid, testable hypothesis
Example 1:
“ there is no difference in the number of cavities in children and teenagers using crest and no-teeth toothpaste”
24
The Experimental Method Example 2:
“ there is no difference in user performance (time and error rate) when selecting a single item from a pop-up or a pull down menu, regardless of the subject’s previous expertise in using a mouse or using the different menu types”
25
The Experimental Method
b) Explicitly state the independent variables that are to be altered independent variable
the things you manipulate independent of how a subject behaves determines a modification to the conditions the subjects undergo may arise from subjects being classified into different groups
In toothpaste experiment toothpaste type: uses Crest or No-teeth toothpaste age: <= 11 years or > 11 years
In menu experiment menu type: pop-up or pull-down menu length: 3, 6, 9, 12, 15 subject type (expert or novice)
26
The Experimental Methodc) Carefully choose the dependent variables that will be
measured Dependent variables
Measures to demonstrate the effects of the independent variables
Properties Readily observable Stable and reliable so that they do not vary under constant
experimental conditions Sensitive to the effects of the independent variables Readily related to some scale of measurement
27
Dependent variables
Some commonly used dependent variables Number of errors made Time taken to complete a given task Time taken to recover from an error
In menu experiment time to select an item selection errors made
In toothpaste experiment number of cavities frequency of brushing
28
What is an experiment?
Three criteria The experimenter must systematically manipulate one or
more independent variables in the domain under investigation
The manipulation must be made under controlled conditions, such that all variables which could affect the outcome of the experiment are controlled see confounding variables, next.
The experimenter must measure some un-manipulated feature that changes, or is assumed to change, as a function of the manipulated independent variable
29
Confounding variables
Variables that are not independent variables but are permitted to vary along in the experiment
“The logic of experiments is to hold variables-not-of-interest constant among conditions, systematically manipulate independent variables, and observe the effects of the manipulation on the dependent variables.”
30
Sources of variation Variations in the task performed The effect of the treatment (i.e. the user interface
improvements that we made) Individual differences between experimental
subjects (e.g. IQ) Different stimuli for each task Distractions during the trial (sneezing, dropping
things) Motivation of the subject Accidental hints or intervention by the experimenter Other random factors.
31
Examples of Confounding Order effects
Tasks done early in testing are slower and more prone to error. Tasks done late in testing may be affected by user fatigue.
Carry-over effects A difference occurs if one condition follows another. E.g. Learning
text editor commands. Experience factors
People in one condition have more/less relevant experience than in others.
Experimenter/subject bias The experimenter systematically treats some subjects different from
others, or when subjects have different motivation levels. Other uncontrolled variables
Time of day, system load.
32
Confounding Prevention
Randomization Negates the order effect.
Random assignment to conditions is used to ensure that any effect due to unknown differences among users or conditions is random.
Counterbalancing Order and carry-over effect. Test half of the users in condition 1 first, and the other half
in condition II first. Different permutations of condition order can be used.
33
Allocation of participants Judiciously select and assign subjects to groups to control
variabilitya) Between-Groups Experiment
Two groups of test users, same tasks for both groups. Randomly assign users to two equally-sized groups. Group A uses only system A, group B only system B.
b) Within-Groups Experiment One group of test users Each user performs equivalent tasks on both systems. Randomly assign users to two equally-sized pools. Pool A uses system A first, pool B system B first.
c) Matched-pairs
34
Example DesignsBetween Groups
System A System B
John Dave
James May
Mary Ann
Stuart Phil
Within Groups
Participant Sequence
Elizabeth A,B
Michael B,A
Steven A,B
Richard B,A
Is more powerful statistically (can compare the same person across different conditions, thus isolating effects of individual differences) Requires fewer participants than between-groups
Learning effects Fatigue effects
Requires more participants No transfer of learning effects
Less arduous on participants large individual variation in user skills
35
Experimental Details Order of tasks
choose one simple order (simple -> complex) unless doing within groups experiment
Training depends on how real system will be used
What if someone doesn’t finish assign very large time & large # of errors
Pilot study helps you fix problems with the study do 2, first with colleagues, then with real users
36
Sample Size
Depends on desired confidence level and confidence interval.
Confidence level of 95% often used for research, 80% ok for practical development.
Rule of thumb: 16-20 test users.
37
Analysing the numbers
Example: trying to get task time <=30 min. test gives: 20, 15, 35, 80, 10, 20 mean (average) = 30 looks good! wrong answer, not certain of anything always chart results
Factors contributing to our uncertainty small number of test users (n = 6) results are very variable (standard deviation = 32)
std. dev. measures dispersal from the mean
38
Experimental Evaluation
Powerful method (depending on the effects investigated)
Quantitative data for statistical analysis
Can compare different groups of users
Reliability and validity good Replicable
High resource demands Requires knowledge of experimental
method Time spent on experiments can mean
evaluation is difficult to integrate into design cycle
Tasks can be artificial and restricted Cannot always generalise to full system
in typical working situation all human behaviour variables cannot be
controlled little recognition of work, time,
motivational & social context subject’s ideas, thoughts, beliefs largely
ignored
Advantages Disadvantages
39
Summary Allows comparison of alternative designs Collects objective, quantitative data (bottom-line data) Needs significant number of test users (16-20) Usable only later in development process Requires administrator expertise Cannot provide why-information (process data) Formal studies can reveal detailed information but take
extensive time/effort Applicability:
system location dangerous or impractical for constrained single user systems to allow controlled manipulation of use
40
Summary (contd.)
Suitable... system location dangerous or impractical for constrained single user systems to allow controlled manipulation of use
Advantages and Dis-advantages sophisticated & expensive equipment uninterrupted environment Hawthorne principle