1 mp2 experimental design review hci w2014 acknowledgement: much of the material in this lecture is...

1

MP2Experimental Design Review

HCI W2014

Acknowledgement: Much of the material in this lecture is based on material prepared for similar courses by Saul Greenberg (University of Calgary) as adapted by Joanna McGrenere

What is experimental design?How do I plan an experiment?

2

Experimental Planning Flowchart

Stage 1

Problem definition

research idea

literaturereview

statement ofproblem

hypothesisdevelopment

Stage 2

Planning

define variables

controls

apparatus

procedures

Stage 3

Conductresearch

datacollection

Stage 4

Analysis

datareductions

statistics

hypothesistesting

Stage 5

Interpret-ation

interpretation

generalization

reporting

select subjects

design

pilottesting

feedback

feedback

What’s the goal? Overall research goals impact choice of study

design– Exploratory research vs. hypothesis confirmation– Ecological validity vs tightly controlled

The stage in the design process impacts the choice of study design– Formative evaluation (to get iterative feedback on initial

design and/or design choices)– Summative evaluation (to determine whether the design

is better/stronger/faster than alternative approaches)

3

What’s the research question? Study research questions impact choice of:

– Protocol, task– Experimental conditions (factors)– Constructs (effectiveness)– Measures (task completion, error rate)

Testable hypotheses impact – choice of statistical analysis (also impacted by nature of

the data and experimental design)

4

5

Experimental Planning Flowchart

Stage 1

Problem definition

research idea

literaturereview

statement ofproblem

hypothesisdevelopment

Stage 2

Planning

define variables

controls

apparatus

procedures

Stage 3

Conductresearch

datacollection

Stage 4

Analysis

datareductions

statistics

hypothesistesting

Stage 5

Interpret-ation

interpretation

generalization

reporting

select subjects

design

pilottesting

feedback

feedback

Reality check: does the final design support the research questions

6

Quantitative system evaluation

Quantitative: – precise measurement, numerical values– bounds on how correct our statements are

Methods– Controlled Experiments– Statistical Analysis

Measures– Objective: user performance (speed & accuracy)– Subjective: user satisfaction

7

Controlled experiments

The traditional scientific method– clear convincing result on specific issues– in HCI:

insights into cognitive process, human performance limitations, ... allows comparison of systems, fine-tuning of details ...

Strive for– lucid and testable hypothesis (usually a causal inference)– quantitative measurement– measure of confidence in results obtained (inferential

statistics)– ability to replicate the experiment– control of variables and conditions– removal of experimenter bias

8

The experimental method

a) Begin with a lucid, testable hypothesis

H0: there is no difference in user performance (time and error rate) when selecting a single item from a pop-up or a pull down menu, regardless of the subject’s previous expertise in using a mouse or using the different menu types

File Edit View Insert

New

Open

Close

Save

File

Edit

View

Insert

New

Open

Close

Save

9


b) Explicitly state the independent variables that are to be altered

Independent variables– the things you control (independent of how a subject behaves) – two different kinds:

1. treatment manipulated (can establish cause/effect, true experiment)2. subject individual differences (can never fully establish cause/effect)

in menu experiment– menu type: pop-up or pull-down– menu length: 3, 6, 9, 12, 15– expertise: expert or novice (a subject variable – the researcher can

not manipulate)

10


c) Carefully choose the dependent variables that will be measured

Dependent variables– variables dependent on the subject’s behaviour / reaction to

the independent variable

– Make sure that what you measure actually represents the higher level concept!

in menu experiment – time to select an item– selection errors made– Higher level concept (user performance)

11


d) Judiciously select and assign subjects to groups

Ways of controlling subject variability– recognize classes and make them an independent variable– minimize unaccounted anomalies in subject group

superstars versus poor performers

– use reasonable number of subjects and random assignment

Novice Expert

12

The experimental method...

e) Control for biasing factors– unbiased instructions +

experimental protocolsprepare ahead of time

– double-blind experiments, ...– Potential confounding

variables

– Order effects– Learning effects– Counterbalancing

(http://www.yorku.ca/mack/RN-Counterbalancing.html)

Now you get to do thepop-up menus. I thinkyou will really like them...I designed them myself!

13


f) Apply statistical methods to data analysis– Confidence limits: the confidence that your conclusion is

correct “The hypothesis that mouse experience makes no

difference is rejected at the .05 level” (i.e., null hypothesis rejected)

means:– a 95% chance that your finding is correct– a 5% chance you are wrong

g) Interpret your results– what you believe the results mean, and their implications– yes, there can be a subjective component to quantitative

analysis

Experimental designs Between subjects: Different participants - single

group of participants is allocated randomly to the experimental conditions.

Within subjects: Same participants - all participants appear in both conditions.

Matched participants: participants are matched in pairs, e.g., based on expertise, gender, etc.

Mixed: Some independent variables are within subjects, some are between subjects

www.id-book.com14

Within-subjects

It solves the individual differences issues Allows participants to make comparisons

between conditions But raises other problems:

– Need to look at the impact of experiencing the two conditions

Order Effects Changes in performance resulting from

(ordinal) position in which a condition appears in an experiment (always first?)

Arises from warm-up, learning, learning what they will be asked to reflect upon, fatigue, etc.

Effect can be averaged and removed if all possible orders are presented in the experiment and there has been random assignment to orders

Sequence effects Changes in performance resulting from

interactions among conditions (e.g., if done first, condition 1 has an impact on performance in condition 2)

Effects viewed may not be main effects of the IV, but interaction effects

Can be controlled by arranging each condition to follow every other condition equally often

Counterbalancing

Controlling order and sequence effects by arranging subjects to experience the various conditions (levels of the IV) in different orders

Self-directed learning: investigate the different counterbalancing methods

– Randomization– Block Randomization– Reverse counter-balancing– Latin squares and Greco squares (when you can’t fully

counterbalance)– http://www.experiment-resources.com/counterbalanced-measures-d

esign.html

http://www.experiment-resources.com/counterbalanced-measures-design.html

http://www.experiment-resources.com/counterbalanced-measures-design.html

Between, within, matched participant design

www.id-book.com19

Internal Validity the extent to which a causal conclusion based on a

study is warranted

Internal validity is reduced due to the presence of controlled/confounded variables

– But not necessarily invalid It’s important for the researcher to evaluate the

likelihood that there are alternative hypotheses for observed differences

– Need to convince self and audience of the validity

External validityThe extent to which the results of a study can be

generalized to other situations and to other people

If the experimental setting more closely replicates the setting of interest, external validity can be higher than in a true experiment run in a controlled lab setting

Often comes down to what is most important for the research question– Control or ecological validity?

Control

True experiment = complete control over the subject assignment to conditions and the presentation of conditions to subjects– Control over the who, what, when, where, how

Control of the who => random assignment to conditions– Only by chance can other variables be confounded

with IV Control of the what/when/where/how => control

over the way the experiment is conducted

Quasi-Experiment

When you can’t achieve complete control– Lack of complete control over conditions– Subjects for different conditions come from

potentially non-random pre-existing groups Experts vs novices Early adopters vs technophobes?

It’s a matter of controlTrue Experiment Random assignment

of subjects to condition

Manipulate the IV

Control allows ruling out of alternative hypotheses

Quasi Experiment Selection of subjects for

the conditions Observe categories of

subjects– If the subject variable

is the IV, it’s a quasi experiment

Don’t know whether differences are caused by the IV or differences in the subjects

Other features In some instances cannot completely control

the what, when, where, and how– Need to collect data at a certain time or not at

all– Practical limitations to data collection,

experimental protocol

1 mp2 experimental design review hci w2014 acknowledgement: much of the material in this lecture is...

Documents