lean experimentation

Post on 17-Jan-2015

2.148 Views

Category:

Automotive

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk given on lean experimentation in research and practice at Cornell Information Science.

TRANSCRIPT

Lean ExperimentationHow to leverage online experiments in research and practice

Cornell IS Breakfast TalkApril 4th, 2012

Thomas HøgenhavenTwitter: @thogenhaven

Friday, April 6, 12

Agenda

1. Conducting Online Experiments

2. Experimentation Literature

3. Experimentation in SMEs and Government Today

4. Lean Experimentation

Friday, April 6, 12

Conducting Online ExperimentsI

Friday, April 6, 12

The Why Bother Question

“While some social scientists engage in small-scale

controlled experimentation with dozens of users or

groups, the capacity to perform large-scale interventions

with thousands of users opens up new opportunities for

research."

(Preece and Schneiderman 2009: 25).

Friday, April 6, 12

What I Mean With Online Experiments

In online experiments, we are interested in examining

online behavior. Not just using the internet as a means

to examine offline behavior.

Friday, April 6, 12

What I Mean With Online Experiments

VariationA

VariationB

Variationn

Independent variable

Dependent variable

OnlineBehaviorBehavior Online

Behavior

Statistical test

Difference

Dependent variable

OnlineBehavior

Users

Friday, April 6, 12

The High-Level Experimental Process

Thomke 1998: 745.

Friday, April 6, 12

Example: Experimentation At Microsoft

Guess which one performs better, in each of these 8 pairs.

Anyone getting 6/8 right, wins a t-shirt

Friday, April 6, 12

Experimenting At Microsoft

Kohavi et al (2009): Online Experimentation at Microsoft

A B

A B

A B

A B

A B

A B

A B

A B

Which one is significantly better?[] A[] B[] None of them

1

2

3

4

5

6

7

8

Friday, April 6, 12

Experimenting At Microsoft

Kohavi et al (2009): Online Experimentation at Microsoft

A B

A B

A B

A B

A B

A B

A B

A B

0 / 200 Microsoft employeesgot more than 5 / 8 answers right

1

2

3

4

5

6

7

8

Friday, April 6, 12

What Is The Effect Of Experiments?

33%

33%

33%

Improvement No Effect Disimprovement

Kohavi et al (2009): Online Experimentation at Microsoft

Friday, April 6, 12

Is That Just Microsoft Being Microsoft?

No. Estimating effects of changes is incredible hard.

Netflix considers 90% of what they try to be wrong.

Friday, April 6, 12

It’s Actually Hard To Predict

https://whichtestwon.com/past-tests

Friday, April 6, 12

Experimental Literature2

Friday, April 6, 12

Current Experimental Framework in HCI

Psychology &Social Psychology

Experimental methodology literature

HCI

Friday, April 6, 12

Offline And Online Experiments

• Psychology literature sometimes uses the internet to study human behavior

• But it does not use the internet to study the internet

Friday, April 6, 12

For example...

2010

No mentions of experimentation in online environments

Friday, April 6, 12

Offline And Online Experiments

Laboratory Field

Offline

Online

Friday, April 6, 12

Offline And Online Experiments

Laboratory Field

Offline

Online

Psychology covers this

Friday, April 6, 12

Offline And Online Experiments

Laboratory Field

Offline

Online

Psychology covers this

But not this

Friday, April 6, 12

The Research There Is, Is Not Systematic

"To the extent of our knowledge, no research has so far been

reported on treating online test design and implementation in a

systematic manner"

(Cámara and Kobsa 2009: 18).

Friday, April 6, 12

Online Experiments In Academia

CHI and CSCW use experiments all the time - but more can be

invested in methodology literature.

This will help explore possibilities and limitations of online

experimentation

Friday, April 6, 12

Experimentation In SMEs And Government Agencies Today3

Friday, April 6, 12

State Of The Art In Industry Today

• Experimentation is increasing

• At least 25 different software vendors• $0 - $320,000 a year*

*Source: whichmvt.com

Friday, April 6, 12

Practice Has Its Own Literature

Friday, April 6, 12

Website Experiments

Several ways to conduct experiments1. Server-side / Client-side

2. A/B Test / Multivariate Test

Friday, April 6, 12

Not Overly Expensive Software

Google Website Optimizer(free)

Visual Website Optimizer($600 - $3000 / year)

Just 2 out of 25+ vendors

Friday, April 6, 12

A/B/n Experiment

WebpageA

WebpageB

Webpagen

Javascript

Independent variable

Dependent variable BehaviorBehavior Behavior

Statistical test Difference

Dependent variable Behavior

Users

Friday, April 6, 12

Google Website Optimizer

Friday, April 6, 12

Limitations Of Mainstream Experimental Software

1. Limited to between-subject design

2. Lack of data export

3. No control over statistical test

4. Expensive coding necessary

Friday, April 6, 12

Limitation 1: Limited To Between Subject Design

• Cannot control for individual differences (No such data is collected / made available)

• Requires more experimental subjects

• No pre-experimental data is collected

Friday, April 6, 12

Limitation 2: Lack of Data Export

Friday, April 6, 12

Google Website Optimizer: Data Export

Friday, April 6, 12

Visual Website Optimizer

Friday, April 6, 12

Visual Website Optimizer: Data Export

Friday, April 6, 12

Software Limitations: Data Export

• Some software better than other

• No data on individual users

• No segmentation on background variables

• This might be the biggest problem, as this is where many significant results lie.

Friday, April 6, 12

Limitation 3: No Choice Between Statistical Tests

Okay?

Friday, April 6, 12

Statistical Test = Chance To Beat Original

“The chance to beat original ... displays the probability that a combination will be more the successful than the original version.

When numbers in this column are high, perhaps around 95%, that means a given combination is probably a good candidate to replace your original content.

Low numbers in this column mean that the corresponding combination is a poor candidate for replacement.”

http://support.google.com/websiteoptimizer/bin/answer.py?hl=en&answer=55944

Friday, April 6, 12

Visual Website Optimizer Is More Transparent

“ Visual Website Optimizer uses z-tests for both A/B tests and multivariate tests”

Standard Error (SE) = Square root of (p * (1-p) / n)

http://visualwebsiteoptimizer.com/split-testing-blog/what-you-really-need-to-know-about-mathematics-of-ab-split-testing/

Friday, April 6, 12

z-tests

• Focus on a single parameter

• Assumes parametric assumptions are met

We don’t know if data fits this

Friday, April 6, 12

Limitation 4: Coding Required

WebpageA

WebpageB

Webpagen

Javascript

Independent variable

Dependent variable BehaviorBehavior Behavior

Statistical test Difference

Dependent variable

UsersHave to

be coded

Friday, April 6, 12

Software Limitations: Expensive Coding

We already coded it, so we can as well keep it. I hate working for no reason

Friday, April 6, 12

Software Limitations: Expensive Coding

I knew this wouldn’t work! We should never have spent resources on it...

Friday, April 6, 12

The Challenge

1. Overcome methodological limitations of experimental

software

2. Reduce development costs

3. Explore possibilities and limitations of online experimentation

Friday, April 6, 12

Lean Experimentation4

Friday, April 6, 12

Test Environment

ProxyA

ProxyB

Proxyn

Independent variable

Dependent variable Behavior

Statistical test Difference

Dependent variable

Behavior on website

Users

Behavior on website

Behavior on website

Friday, April 6, 12

Proxies For Experimentation

Website Email

Survey Ads

Friday, April 6, 12

Comparative Advantages And Disadvantages

Friday, April 6, 12

Lean Experimentation Principles

1.Test assumptions, ideas, and theories

2. Test before coding, not after

3. Test in the field

Friday, April 6, 12

1. Test Assumptions, Ideas, And Theories

Friday, April 6, 12

2. Test Before Coding, Not After

Bad Idea

Good Idea

Experimentation

Implementation

Ideas

Friday, April 6, 12

3. Test In The Field

• Identical design patterns have different effects in different contexts

• E.g. social comparison information in respectively competitive and cooperative communities

• Cocktail effects are largely unknown

Friday, April 6, 12

Requirements Of Lean Experimentation

1. Independent groups

2. Random assignment

3. Allows tracking

Friday, April 6, 12

Why Use Proxies For Experimentation?

Friday, April 6, 12

Test Environment

• Manipulates the independent variable through a proxy

• Examines dependent variable in natural field environment

Friday, April 6, 12

Test Subjects

• Existing users (when using website, email, and survey)

• Potential users (when using advertisements)

Friday, April 6, 12

Proposed Usage and limitations

Good for• Ideas• Theories• Hypothesis• Features

Less suited for• Small changes• Graphical changes

Can be useful if testing assumptions

Friday, April 6, 12

Data Output

• Mixed sources that need to be combined• Open / CTR rates from proxy• Web analytics• SQL databases

Friday, April 6, 12

Durability of Proxy Experiment is short

0

4

8

12

16

Wk0 Wk1 Wk2 Wk3

Control Experimentation

Email experiment

Friday, April 6, 12

Buy In Needed

1. Making changes on websites

2. Sending Emails

3. Conducting Surveys

4. Running Ads

Hard to sell

Easy to sell

Friday, April 6, 12

Feedback Quality

1. Wireframes / early stage development

2. Finished / Nearly finished stages

Critical feedback

Not so critical feedback

Friday, April 6, 12

Influence On Decisions

Increased likelihood of impact when getting experimental effect data early

Friday, April 6, 12

top related