159.410/710 user interface design~ 1 ~ 159.410/710 user interface design~ 1 ~ chapters 1 & 2: me...

56
159.410/710 User ~ 2 ~ 159.410/710 User Interface Design © Paul Lyons 2010

Upload: julie-gilmore

Post on 28-Dec-2015

235 views

Category:

Documents


1 download

TRANSCRIPT

159.410/710 User Interface Design~ 2 ~

159.410/710 User Interface Design

© Paul Lyons 2010

159.410/710 User Interface Design~ 3 ~ 159.410/710 User Interface Design~ 3 ~

EngineersHolistic

Constructionist

ScientistsReductionist

Analyst/synthesists

Arts Complexity

Subjectivists

Epistemology

I hear and I forget. I see and I remember. I do and I understand.

Confucius (attributed)

Approaches to knowledge

159.410/710 User Interface Design~ 4 ~ 159.410/710 User Interface Design~ 4 ~

Types of HCI research

Development of interaction widgets

Usability – efficiency, enjoyability

Internet & web

Social applications

Mobile applications – shoehorning complex applications onto tiny screens

Epistemology

159.410/710 User Interface Design~ 5 ~ 159.410/710 User Interface Design~ 5 ~

HCI research has a focus on people

Characteristics of HCI research

what computers can do is

what computers can help people do is

sociology psychologystatisticscomputer science

rigorous research methodologies are required

it isn’t enough to develop a new interface or a new interface componentdoes the new interface make things better?how do you know?are you sure?

observation techniquescontrolled experimentshanding noisy datadeveloping (genuinely) new interface paradigms

not the main point

variety of contributing disciplines

159.410/710 User Interface Design~ 6 ~ 159.410/710 User Interface Design~ 6 ~

Things to measure

Characteristics of HCI research

time to complete a task

number of tasks completed in a standard time

accuracy of performing a task

accuracy of performing a task

performance measureslargely industry-driven

enjoyment

emotional wellbeing

why people choose to spend discretionary time using computerse.g. contributing to Wikipedia

why people choose to stop using applications

people’s usage patterns of mobile computing devices and social apps

difficult to measure in a laboratory setting

159.410/710 User Interface Design~ 7 ~ 159.410/710 User Interface Design~ 7 ~

Replication of resultsMultiple studies that reach the same or similar conclusion

Triangulation by different research methodsif a single method produces identical results repeatedly,the reason may be a flawed method

Characteristics of HCI research

Results may change over timereasons for using a computer 1980s vs. 2000sfinding information – searching and tagging vs. hierarchical directories

159.410/710 User Interface Design~ 8 ~ 159.410/710 User Interface Design~ 8 ~

Tradeoffs

Characteristics of HCI research

speed vs accuracy (Fitt’s Law)

better interface vs. familiar interface

more efficient keyboard vs. QWERTY keyboard

iPad is cool and newit’s the coolth that persuades people to adopt ithow do you measure that?

security vs. usabilityeye-scans and fingerprints?

a revolutionary, undeniably better computer vs. environmental costs of computer disposal

159.410/710 User Interface Design~ 9 ~ 159.410/710 User Interface Design~ 9 ~

HCI is an interdisciplinary discipline

Characteristics of HCI research

human factors engineering psychology

library scienceinformation science

art and design

in the past in the present in the future

ubiquitous?Virtual Reality?mind-activated?

all suitexperimental

design

competitionwith judges

(cf. architecture)?

reductionistwidely accepted:

statistical testscontrol groups

reliable

more holisticmore subjective

less trusted (not less trustworthy)

159.410/710 User Interface Design~ 10 ~ 159.410/710 User Interface Design~ 10 ~

generative

prescriptive

predictive

explanatory

descriptive

Types of model

Epistemology

generalisations about the data – allow us to see order amidst chaos

explain what causes the data that have been observed

allow us to plan for the future

suggest ways of building things (e.g. patterns)

produce principles and guidelines or actual systems (e.g. Colour Harmoniser)

159.410/710 User Interface Design~ 11 ~ 159.410/710 User Interface Design~ 11 ~

Usability TestingThe goal of usability testing is simply to find flaws in a specific interface

A small number of users may take part

… it can be structured or unstructured.

… there is no claim that the results can be generalised.

The goal is simply to find flaws and help the developers improve the interface.

If that involves jumping in and helping a user or changing the task mid-process, that is acceptable.Lazar, Feng and Hocheiser

Research Methods in HCI2010

Experimental Research

159.410/710 User Interface Design~ 12 ~ 159.410/710 User Interface Design~ 12 ~

HCI research – 57 varieties

Experimental Research

observationsfield studiessurveysusability studiesinterviewsfocus groupscontrolled experiments

rich, not reproducible

reproducible, reductionist

descriptive research; observations – may be quantitative and accurate

relational research; establishes correlations between factors – does not establish causalitytyping speed correlated with hours spent gamingdoes time spent gaming improve typing?are good typists successful gamers?

experimental research; can establish causalityallocate users to two groups randomlyexpose one group to games, the other notmeasure typing ability of groups after a suitable interval

159.410/710 User Interface Design~ 13 ~ 159.410/710 User Interface Design~ 13 ~

Null and alternative hypotheses

Experimental Research

H0

H1

no effect an effect

the treatment causes…

themutual

exclusion seesaw

the new widget causes…

no changein speed

some changein speed

no change in user satisfaction

some change in user satisfaction

testing multiplehypothesescan complicate controls and variables

a good hypothesesis clear and unambiguousclearly distinguishes between independent and dependent variablesis testable in a single experimentclearly identifies control groups and conditions of experiment

generally derives from preliminary observational studies

and also

However…

each combination of independent variables is a condition

159.410/710 User Interface Design~ 14 ~ 159.410/710 User Interface Design~ 14 ~

Independent and dependent variables

Experimental Research

independent variable

variations in value are under the experimenter’s control

dependent variable

variations in value are observed

“cause” “effect”

Null hypothesis: there is no speed change between the original widget and the new widget

experimenter makes choice of widgetit’s the independent variable

experimenter measures thisit’s the dependent variable

independent variable

if the experimental results are plotted on a graphindependent variable goes on the x-axisdependent variable goes on the y-axis

dependentvariable

159.410/710 User Interface Design~ 15 ~ 159.410/710 User Interface Design~ 15 ~

Typicalindependent

variables

Experimental Research

Typicaldependentvariables

Technologytyping vs. speech

mouse vs. joystick, touchpad etc

Designpull-down vs. pop-up menu

colour schemelayout

Demographicgender, age, experience,

education

Contextlighting, noise, seated vs standing,

other people in the vicinity

Efficiencytime to complete a task, speed

Accuracyerror rate

Subjective SatisfactionLikert scale ratings

Ease of learning and retention ratetime to learn, loss after a week, a month

Cognitive Demandtime before onset of fatigue

159.410/710 User Interface Design~ 16 ~ 159.410/710 User Interface Design~ 16 ~

Components of an experiment

Experimental Research

Treatments

things we want to compare (cf. medical treatments)

Units“things” that treatment is applied to(normally human subjects)

Assignment methodhow subjects are assigned to treatments

randomisation is often necessary

compare two splines A and B for a CAD tooluse a within-subjects designmeasure time-to-complete task with A, then B

flaw: subjects learnt the task, so B is bestsolution; randomise order of tasks.

comparing two treatments using a between-subjects designallocate subjects to treatment A, till enoughthen allocate subjects to treatment B

flaw: A is applied to early birds, B to late sleepersSolution: randomise allocation to the treatments

159.410/710 User Interface Design~ 17 ~ 159.410/710 User Interface Design~ 17 ~

Significance tests

Experimental Research

how do we decide whether or not an effect is real?we measure the probability that it occurred by chance

this approach depends on being able to distinguish between an effect and no effect

if that probability is sufficiently low, we say that there is a significant effect.

p < 0.05 says that the probability that the observed behaviour occurred by chance is less than 5% or that the probability that the effect is real exceeds 95%

whether that’s good enough depends on the application

for a new drug, a significance level of p < 0.05 is not good enough if the null hypothesis is “the standard dose is not fatal”

0.005 0.5%99.5%

159.410/710 User Interface Design~ 18 ~ 159.410/710 User Interface Design~ 18 ~

Type I errors

Experimental Research

&Type II errors

(aka “false positive”)

(aka “false negative”)

widget is no different

widget is better

study concludeswidget no better

study concludeswidget is better

type I (gullibility) error

type II (blindness) error

probability of Type I error = αprobability of Type II error = β

probability of correctly rejecting an incorrect null hypothesis, statistical power of a test = 1- β

probability that effect occurred by chance, p-value = α

generally aim for p < 0.05

keep β low by using large sample sizesα and β are related; the less gullible you are, the more likely you are to be blind to improvements probability of finding an effect that does exist

159.410/710 User Interface Design~ 19 ~ 159.410/710 User Interface Design~ 19 ~

Limitations of experimental research

Experimental Research

controlled experiments are a very powerful technique

but hypothesis must be well-definednumber of variables must be limited, preferably orthogonal

HCI problems can be difficult to definemany, interrelated factors may be involvedfactors other than independent variables may not affect dependent variables

e.g. difficult to factor out familiarity with technology in an age-related study

prescreen to ensure homogeneity between subject groups

use statistical techniques designed to filter out confounding factors

(analysis of covariables)subjects’ behaviour in a lab differs from behaviour in real world

159.410/710 User Interface Design~ 20 ~

True Experiments

Experimental Design

a testable hypothesis

two conditions (one treatment, one control)sometimes more

quantitative measurements

significance tests

attention to bias elimination

replicable

subjects will be able to use <x> correctly it in under 1 minute

<x> is an intuitive interface

all of them? 50% > 75%

random assignment of subjects no?quasi-experimentmultiple conditions

non-experimentsingle condition

e.g. not ethical to randomly assign children to parents to study effect of single-parent upbringing

159.410/710 User Interface Design~ 21 ~

Other types of experiment

Experimental Design

quasi-experiments (subjects not randomly assigned)may be necessary for practical or ethical reasonscan still produce useful results but more susceptible to confounding factors

insufficient subjects – use what’s availableresearcher lacks influence (modified Word interface)non-experiments (no control group)

may be necessary for practical or ethical reasonscan still produce useful results but even more susceptible to confounding factorse.g. usability trials – aim is to detect problems

formal experiments are designed to detect subtle effectsto factor out researcher biasresearcher’s specialist knowledge may trump population’s preferences (e.g. user surveys for Xerox showed little demand for such a device)

is demonstrating that it is possible to build something a valid experiment?engineering research often stops at this point

159.410/710 User Interface Design~ 22 ~

Important considerations

Experimental Design

number of independent variables

Hypothesis: There is no difference between target selection speed when using a mouse, a joystick, or a trackball to select icons of different sizes (small, medium, large)

How many independent variables? • type of pointing device• icon size

number of conditions 3 x 3 = 9

number of dependent variables 1 measurement may need careful thoughte.g. is typing speed wpm or error-free wpm? is speech recogniser error rate definitive?

159.410/710 User Interface Design~ 23 ~

one group per conditioneach subject experiences only one condition

Structure of an experiment

Experimental Design

basic design

factorial

design

no learning effectless fatigue effect

susceptible to differences between groups

only one groupbut subjects experience multiple conditions

eliminates individual differencessmaller population required

learning and fatigue may cause effects

requires big groups, randomly selected cognitively simple tasks, no learning effect(inter-subject diffs increase with complexity)tasks where subject difference is indep var.

randomise order of conditions and/or provide preliminary trainingtasks with large diffs between individualse.g. cognitively complex taskssuits small subject pools

mix of within-group and between-groupeffect of using GPS (binary, within-group) on three age-groups (between-groups)

between-group

within-group

split-plot

1 independent variable

>1 independent variable Order tasks using a Latin square to factor out fatigue

1 2 32 3

1 231

Subj1Subj2Subj3

159.410/710 User Interface Design~ 24 ~

Watch out for interaction effects

Experimental Design

if the effect of a variable depends on the value of the other variable, the variables interact

the variables are (should be) independent, but their effects interact

Office 2003 Office 2007

taskduration simple

task

complextask

genuinely independent variables

159.410/710 User Interface Design~ 25 ~

Reliability of experimental results

Experimental Design

research involving human subjects is noisy

observed value = actual value + random error (noise)

with increased sample size, actual values add, relative size of noise tends to 0

sample size

actual valuenoise value

random errors

systematic errors

the same each time – not cancelled by large samplemore deleterious than noise

159.410/710 User Interface Design~ 26 ~

Systematic errorsinstrumentation errors

can often replace instruments (e.g. stopwatch) with software

experimental procedure non-random task condition order allows learning & fatigue to have an effectinstructions may introduce errors

complete the task as fast as possible vs. take your time, no rush produced different resultsinstructions from different members of research team may differ

trivial detailsdata entry on a PDA – holding PDA in hand produced different results from sitting PDA on table

randomise conditions and tasks when using within-group designuse identical instructions for all participants – written or recordedrun pilot studies beforehand to detect potential biases

don’t want to realise half-way through the experiment that all the results are compromised you have overlooked something use real participants from target population

Experimental Design

subjects under time stress were slower!

opposite systematic errors

159.410/710 User Interface Design~ 27 ~

Systematic errorsinstrumentation errors

can often replace instruments (e.g. stopwatch) with software

experimental procedure non-random task condition order allows learning & fatigue to have an effectinstructions may introduce errors

complete the task as fast as possible vs. take your time, no rush produced different resultsinstructions from different members of research team may differ

trivial detailsdata entry on a PDA – holding PDA in hand produced different results from sitting PDA on table

randomise conditions and tasks when using within-group designuse identical instructions for all participants – written or recordedrun pilot studies beforehand to detect potential biases

don’t want to realise half-way through the experiment that all the results are compromised you have overlooked something use real participants from target population

participants

Experimental Design

opposite systematic errors

159.410/710 User Interface Design~ 28 ~

instrumentation errors

Experimental Design

Systematic errors

participantsage biaseducation bias (particularly prevalent in university studies)interest in the product (or its domain)

experimental procedure

recruit a set of participants representative of target populationmay be quite skewed - e.g. for elder-care systems

don’t stress the participants

explain that the system is under test, not themany result they produce is good

organise schedule conservatively so participants aren’t inconvenienced it’s polite, and it produces better results!

159.410/710 User Interface Design~ 29 ~

instrumentation errors

Experimental Design

Systematic errors

participantsage biaseducation bias (particularly prevalent in university studies)interest in the product (or its domain)

experimental procedure

recruit a set of participants representative of target populationmay be quite skewed - e.g. for elder-care systems

don’t stress the participants

explain that the system is under test, not themany result they produce is good

organise schedule conservatively so participants aren’t inconveniencedit’s polite, and it produces better results!

159.410/710 User Interface Design~ 30 ~

Experimental Design

Systematic errorsinstrumentation errors

participantsexperimental procedure

experimenter behaviourexpress no opinion about the systemmaintain noncommittal body languagebe ready to start on timeuse the same experimenter each time, if possible, or a recorded protocolif multiple experimenters are necessary, require them to follow a written experimental protocol

159.410/710 User Interface Design~ 31 ~

Experimental Design

Systematic errorsinstrumentation errors

participantsexperimental procedure

experimenter behaviourexpress no opinion about the systemmaintain noncommittal body languagebe ready to start on timeuse the same experimenter each time, if possible, or a recorded protocolif multiple experimenters are necessary, require them to follow a written experimental protocol

159.410/710 User Interface Design~ 32 ~

Experimental Design

Systematic errorsinstrumentation errors

participantsexperimental procedure

experimenter behaviourenvironmental factors

physical environment: noise, temperature , humidity, lighting, vibrationsocial environment: people nearby, power relationships of participants and people nearby, interruptions

quiet roomsuitable lightingcomfortable furniturenon-distracting environmentobservation by CCTV or from behind 1-way mirror, if possible

for field studies, visit the location beforehand to check for problems

159.410/710 User Interface Design~ 33 ~

Experimental Procedures

Experimental Design

1. Identify a research hypothesis2. Design the study3. Run a pilot4. Recruit participants5. Run data collection sessions6. Analyse the data7. Report the conclusions

1. Set up the experimental environment/equipment2. Greet participants3. Outline the purpose of the study and the procedures4. Obtain participants’ consent5. Assign participants to experimental condition6. Participants complete pre-survey (if any)7. Participants complete training task8. Participants complete survey task9. Participants complete post-survey (if any)10. Debrief (can be more useful than formal survey)

159.410/710 User Interface Design~ 34 ~

There are many analytical tools

Analysing the Data

t-test

ANOVA

paired-samples

independent samples

correlation

regression

Chi-square

one-way

factorial

repeated measures

(e.g. before/after tests)

159.410/710 User Interface Design~ 35 ~

Data Preparation

Analysing the Data

error checking and correction

may need pre-processing

analysis may require data to be restructured

impossible age

computing experience > age paper forms needs checkingsurvey software (or Excel) could check at data collection time

incorrect grouping of survey forms

if data can’t be corrected, may need to be thrown away

age: 23½, 23 years and 7 months, nearly twenty-four

e.g. because subjects are anonymous

coding text as numbers (e.g. 1, 2, 3, for no degree, bachelors, P/G)

extracting general themes from individual interviews

coding interaction events (e.g. click(100, 250) “select book”)

consistency may need to be verified if > 1 coder

related information in different surveys (pre & post trial, for example)

analysis software requires specific formattingSPSS independent samples and paired samples t-tests

use same data in 1 column and 2 parallel columns

159.410/710 User Interface Design~ 36 ~

datamax – datamin (crude – likely to increase with sample size & sensitive to outliers)

Start with an exploratory analysis

Analysing the Data

Get a feel for the data mean, median, moderange, variance, standard deviation

good for initial comparison between groups

measures of central tendencymeasures of spread

50th percentile - good for skewed (e.g. Pareto) distribution and when outliers may be errorsmost popular

∑i = 1

n

(xi – x)2(n – 1)

variance

estimates spread around mean: sign-independent, but has different units from samplesestimates spread around mean: sign-independent, but has different units from samplesestimates spread around mean: sign-independent, but has different units from samples

more complex measures of spread assume a normal distributionmay be necessary to modify data to conform

Histograms Box-and-whisker plots

159.410/710 User Interface Design~ 37 ~

Start with an exploratory analysis

Analysing the Data

Get a feel for the data mean, median, moderange, variance, standard deviation

measures of central tendencymeasures of spread

∑i = 1

n

(xi – x)2(n – 1)

estimates spread around mean: sign-independent, but has different units from samples

variance

more complex measures of spread assume a normal distributionmay be necessary to modify data to conform

159.410/710 User Interface Design~ 38 ~

Start with an exploratory analysis

Analysing the Data

Get a feel for the data mean, median, moderange, variance, standard deviation

measures of central tendencymeasures of spread

∑i = 1

n

(xi – x)2(n – 1)

same units as samples

s √standard deviation

more complex measures of spread assume a normal distributionmay be necessary to modify data to conform

measures mean deviation of samples

from mean of samples

159.410/710 User Interface Design~ 39 ~

Analysing the Data

Mean differences treatments are different

comparing two search enginestwo groups, same task

one group, two tasks

differences between meansdifference between treatments

Δ = 5 Δ = 5

are these meansdifferent?

are these meansdifferent?

significance tests are necessary to determine probability that difference is due to chance

between-groups

within-group

mixed

IVs conditions per IV test 1 1≥2

1 1≥2

≥2

2≥3≥2

2≥3≥2

≥2

independent-samples t-test1-way ANOVAfactorial ANOVA

paired-samples t-testrepeated measures ANOVArepeated measures ANOVA

split-plot ANOVA

159.410/710 User Interface Design~ 40 ~

To compare 2 means use a t-test

Analysing the Data

null hypothesistask completion times for subjects using word-prediction software

do not differ fromtask-completion times for subjects who do not use the software

signal

noises1

2 + s22

n1 n2√t =

however, significance of a particular t depends on size of subject groupsspecifically degrees of freedom, df

generally say there’s a significant effect if α ≤ 0.05

x1 – x2

=signalnoise

remember: same as p-value

consult published tables showing α value for particular (t, df) combinations

statistical software usually outputs α from builtin tables

s2 is, of course, the variance

for 2 gps= total participants – number of groups = n1 + n2 -2

159.410/710 User Interface Design~ 41 ~

To compare 2 means use a t-test

Analysing the Data

null hypothesistask completion times for subjects using word-prediction software

do not differ fromtask-completion times for subjects who do not use the software

signal

noises1

2 + s22

n1 n2√t =

however, significance of a particular t depends on size of subject groupsspecifically degrees of freedom, df

generally say there’s a significant effect if α ≤ 0.05

x1 – x2

=signalnoise

consult published tables showing α value for particular (t, df) combinations

statistical software usually outputs α from builtin tables

= total participants – number of groups = n1 + n2 -2

159.410/710 User Interface Design~ 42 ~

To compare 2 means use a t-test

Analysing the Data

for unrelated samplesuse independent-samples t-test

for a single groupuse paired-samples t-test

null hypothesistask completion times for subjects using word-prediction software

do not differ fromtask-completion times for subjects who do not use the software

times for group using word-prediction softwaretimes for group uses conventional software

SPSS t-test data comprisestimes and group membership

t-value

times for subject using word-prediction softwareand for same subject uses conventional software

SPSS t-test data comprisestimes with software and times without software

t-value

high t-value high P(null hypothesis false)

159.410/710 User Interface Design~ 43 ~

What if hypothesis predicts sign of difference?

Analysing the Data

if we know that sign of effect will be + or –instruct analysis software to use a 1-tailed t-testα = 0.1 indicates same level of confidence as α = 0.05 for 2-tailed test

Do NOT use a one-tailed t-test because 2-tailed test indicates no significancetest should be hypothesis-driven, not data-driven!

159.410/710 User Interface Design~ 44 ~

Σ(x1i - x1)2i

x1i - x1

ANOVA: within-gp variances vs. population variance

Analysing the Data

x1

x2

SS1 Sum of Squares1

SS2

null hypothesis: sample sets A, B, C & D belong to 1 populationif smeans for means of sample sets A - D > scombined populationthere is more than 1 populationF= found variation in averages/expected variation in averages

F=1 supports null hypothesis

159.410/710 User Interface Design~ 45 ~

Analysing the Data

ANOVA: within-gp variances vs. population variancenull hypothesis: sample sets A, B, C & D belong to 1 populationif smeans for means of sample sets A - D > scombined populationthere is more than 1 populationF= found variation in averages/expected variation in averages

159.410/710 User Interface Design~ 46 ~

Analysing the Data

ANOVA: within-gp variances vs. population variance

if effect variance is large w.r.t. error variance treated group & untreated groups act as different populations (treatment has an effect)

within-groups variability (aka error variance)variability due to differences between means (aka effect)

428

62

22

MeanSums of Squares (SS)

Overall MeanTotal Sums of Squares

675

231

Observation 1Observation 2Observation 3

Group 2Group 1

.008

24.0

24.01.0

14

24.04.0

EffectError

pFMS df SS

MAIN EFFECT

much larger differences between means than in the diagram

ANOVA determines ptaking df into account

null hypothesis: sample sets A, B, C & D belong to 1 populationif smeans for means of sample sets A - D > scombined populationthere is more than 1 populationF= found variation in averages/expected variation in averages

159.410/710 User Interface Design~ 47 ~

Use ANOVA (aka F-test) to compare means of ≥ 2 groups

Analysing the Data

we’ve already seen the special case of ANOVA for comparing 2 means: the t-test

1-way ANOVA

factorial ANOVA

repeated measuresANOVA

split-plot ANOVA

design

between-group

IVs

1

conditions

≥ 3

between-group ≥ 2

within-group

between-groupand

within-group

the parameter actually generated by the calculation (cf t-test t)(F = t2)

159.410/710 User Interface Design~ 48 ~

Use ANOVA (aka F-test) to compare means of ≥ 2 groups

Analysing the Data

we’ve already seen the special case of ANOVA for comparing 2 means: the t-test

1-way ANOVA

factorial ANOVA

repeated measuresANOVA

split-plot ANOVA

design

between-group

IVs

1

conditions

≥ 3

between-group ≥ 2

within-group

between-groupand

within-group

159.410/710 User Interface Design~ 49 ~

same calculation produces 2 sets of results – these aren’t relevantsignificance obtained by table lookup

taskdurations

text-prediction

dictation

code

245236321..246213265..178289222.

000..111..222.

standard text entry(control group)

SPSS data input for 1-way ANOVA (pared down to the minimum)

Analysing the Data

we’ve already seen the special case of ANOVA for comparing 2 means: the t-test

1-way ANOVA

design

between-group

IVs

1

conditions

≥ 3

Use ANOVA (aka F-test) to compare means of ≥ 2 groups

between-groupwithin-group

sum of sqs df Mean sq F significance7842.250

37880.375

2

21

3921.125

1803.827

2.174 0.139

SPSS output from the analysis

the statistical calculation produces F-valuegreater than 0.05 not significant

A 1-way ANOVA analysiswith text-entry method as independent variable and task completion time as dependent variablesuggests there is no significant difference between the three conditions:(F(2, 21) = 2.174, n.s.)

24 samples in 3 groups gives df = 21

So, how would we summarise this in a thesis or report?

159.410/710 User Interface Design~ 50 ~

taskdurations

text-prediction

dictation

code

245236321..246213265..178289222.

000..111..222.

standard text entry(control group)

SPSS data input for 1-way ANOVA (pared down to the minimum)

Analysing the Data

we’ve already seen the special case of ANOVA for comparing 2 means: the t-test

1-way ANOVA

design

between-group

IVs

1

conditions

≥ 3

Use ANOVA (aka F-test) to compare means of ≥ 2 groups

between-groupwithin-group

sum of sqs df Mean sq F significance7842.250

37880.375

2

21

3921.125

1803.827

2.174 0.139

SPSS output from the analysis

A 1-way ANOVA analysiswith text-entry method as independent variable and task completion time as dependent variablesuggests there is no significant difference between the three conditions:(F(2, 21) = 2.174, n.s.)

24 samples in 3 groups gives df = 21

So, how would we summarise this in a thesis or report?

159.410/710 User Interface Design~ 51 ~

Analysing the Data

Use ANOVA (aka F-test) to compare means of ≥ 2 groups

we’ve already seen the special case of ANOVA for comparing 2 means: the t-test

1-way ANOVA

factorial ANOVA

design

between-group

IVs

1

conditions

≥ 3

between-group ≥ 2

Q: does nature of the task (composition or transcription) affect performance?

00…

11…

22…

00…

11…

22

data entrymethod

00…

00…

00…

11…

11…

11

task type

245236…

246213…

178289…

256269…

265232…

189321

task time

SPSS data entry format

SPSS fn is called Univariate analysis

standardpredictivedictation

transcription

composition

data entrymethod

task type

gp2

gp4

gp6

gp1

gp3

gp5

0 1

012

159.410/710 User Interface Design~ 52 ~

Use ANOVA (aka F-test) to compare means of ≥ 2 groups

Analysing the Data

we’ve already seen the special case of ANOVA for comparing 2 means: the t-test

1-way ANOVA

factorial ANOVA

design

between-group

IVs

1

conditions

≥ 3

between-group ≥ 2

Q: does nature of the task (composition or transcription) affect performance?

245236…

246213…

178289…

256269…

265232…

189321

00…

00…

00…

11…

11…

11

00…

11…

22…

00…

11…

22

data entrymethod

task typetask time

SPSS data entry format

data entrymethod

standardpredictivedictation

transcription

task type

composition

gp2

gp4

gp6

gp1

gp3

gp5

0 1

012

SPSS fn is called Univariate analysis

159.410/710 User Interface Design~ 53 ~

Use ANOVA (aka F-test) to compare means of ≥ 2 groups

Analysing the Data

we’ve already seen the special case of ANOVA for comparing 2 means: the t-test

1-way ANOVA

factorial ANOVA

design

between-group

IVs

1

conditions

≥ 3

between-group ≥ 2

Q: does nature of the task (composition or transcription) affect performance?

task typeentry method

task * entryerror

sum of sq2745.188

17564.625

114.875

81751.625

df1

2

2

42

mean square2745.188

8782.313

57.437

1946.467

F1.410

4.512

0.030

significance0.242

0.017

0.971

IVs

interaction

task type caused no significant effect: F(1, 42) = 1.41, n.sentry method had a significant effect: F(2, 42) = 4.51, p < 0.05there is no significant interaction between task and entry

SPSS output

159.410/710 User Interface Design~ 54 ~

Analysing the Data

Use repeated measures ANOVA for within-group studiesprevious between-groups design requires lots of participants (72, if 12 subjects/group)

what about a within-groups design? specially if only some are eligible – e.g. disabled

to study effect of 1 IV, use 1-way repeated measures ANOVA

3 data points from each participant, all in the same row

245 246 278246 213 289321 265 222

to study effect of >1 IV, use multi-level repeated measures ANOVA

for 3 x 2 factorial study, 6 data points per participant per row

transcription compositionstandard predictive dictation standard predictive dictation245 246 178 256 265 189236 213 289 269 232 321

participant1participant2

within-groups design fasterless fatiguecan control for learningsmaller sample

159.410/710 User Interface Design~ 55 ~

Assumptions of t tests and F tests

Analysing the Data

no systematic errorse.g. different instructors, with different sets of instructionscorrelation between errors of participants in each instructor’s groupwill systematically skew results

homogeneity of variance (identical distribution of errors)populations should have comparable variances

do these distributions have different means?

x2x1

normal distribution of errorsmay be violated if data is highly skewed (non-normal distribution)

significantly

^not easy to say, either for people or for software

159.410/710 User Interface Design~ 56 ~

Use Pearson’s r to identify correlations

Analysing the Data

r varies from -1 to 1 -1: perfect negative linear relationship 0: no relationship+1: perfect positive linear relationship

is factora related to factorb?

determine Pearson’s product moment correlation coefficient (r)

12 245 246 6 236 213 3 321 26519 212 189

computerexperience

time withstandardsoftware

time withpredictivesoftware

can determine r values for

experience * standard s/wareexperience * predictive s/warestandard * predictive

experience timestd timepredexperience

timestd

timepred

(experience, timestd) has significant –ve correlationtime with std software decreases with computer experienceno other significant correlations

r

r

r

1

1

1

-0.723

-0.723

-0.468

0.325

-0.468 0.325

significance

significance

significance

0.043

0.432

0.243

159.410/710 User Interface Design~ 57 ~

Use Pearson’s r to identify correlations

Analysing the Data

r2 represents percentage of variance in X that can be explained by variable Y represents percentage of variance in Y that can be explained by variable X

but beware: correlation does not imply causatione.g. negative correlation between income and speed of internet search does earning more make you worse at using the internet? or does higher income imply greater age and less familiarity with the internet?

agehigher income

less internet experience lower performance