lean experimentation
DESCRIPTION
Talk given on lean experimentation in research and practice at Cornell Information Science.TRANSCRIPT
Lean ExperimentationHow to leverage online experiments in research and practice
Cornell IS Breakfast TalkApril 4th, 2012
Thomas HøgenhavenTwitter: @thogenhaven
Friday, April 6, 12
Agenda
1. Conducting Online Experiments
2. Experimentation Literature
3. Experimentation in SMEs and Government Today
4. Lean Experimentation
Friday, April 6, 12
Conducting Online ExperimentsI
Friday, April 6, 12
The Why Bother Question
“While some social scientists engage in small-scale
controlled experimentation with dozens of users or
groups, the capacity to perform large-scale interventions
with thousands of users opens up new opportunities for
research."
(Preece and Schneiderman 2009: 25).
Friday, April 6, 12
What I Mean With Online Experiments
In online experiments, we are interested in examining
online behavior. Not just using the internet as a means
to examine offline behavior.
Friday, April 6, 12
What I Mean With Online Experiments
VariationA
VariationB
Variationn
Independent variable
Dependent variable
OnlineBehaviorBehavior Online
Behavior
Statistical test
Difference
Dependent variable
OnlineBehavior
Users
Friday, April 6, 12
The High-Level Experimental Process
Thomke 1998: 745.
Friday, April 6, 12
Example: Experimentation At Microsoft
Guess which one performs better, in each of these 8 pairs.
Anyone getting 6/8 right, wins a t-shirt
Friday, April 6, 12
Experimenting At Microsoft
Kohavi et al (2009): Online Experimentation at Microsoft
A B
A B
A B
A B
A B
A B
A B
A B
Which one is significantly better?[] A[] B[] None of them
1
2
3
4
5
6
7
8
Friday, April 6, 12
Experimenting At Microsoft
Kohavi et al (2009): Online Experimentation at Microsoft
A B
A B
A B
A B
A B
A B
A B
A B
0 / 200 Microsoft employeesgot more than 5 / 8 answers right
1
2
3
4
5
6
7
8
Friday, April 6, 12
What Is The Effect Of Experiments?
33%
33%
33%
Improvement No Effect Disimprovement
Kohavi et al (2009): Online Experimentation at Microsoft
Friday, April 6, 12
Is That Just Microsoft Being Microsoft?
No. Estimating effects of changes is incredible hard.
Netflix considers 90% of what they try to be wrong.
Friday, April 6, 12
It’s Actually Hard To Predict
https://whichtestwon.com/past-tests
Friday, April 6, 12
Experimental Literature2
Friday, April 6, 12
Current Experimental Framework in HCI
Psychology &Social Psychology
Experimental methodology literature
HCI
Friday, April 6, 12
Offline And Online Experiments
• Psychology literature sometimes uses the internet to study human behavior
• But it does not use the internet to study the internet
Friday, April 6, 12
For example...
2010
No mentions of experimentation in online environments
Friday, April 6, 12
Offline And Online Experiments
Laboratory Field
Offline
Online
Friday, April 6, 12
Offline And Online Experiments
Laboratory Field
Offline
Online
Psychology covers this
Friday, April 6, 12
Offline And Online Experiments
Laboratory Field
Offline
Online
Psychology covers this
But not this
Friday, April 6, 12
The Research There Is, Is Not Systematic
"To the extent of our knowledge, no research has so far been
reported on treating online test design and implementation in a
systematic manner"
(Cámara and Kobsa 2009: 18).
Friday, April 6, 12
Online Experiments In Academia
CHI and CSCW use experiments all the time - but more can be
invested in methodology literature.
This will help explore possibilities and limitations of online
experimentation
Friday, April 6, 12
Experimentation In SMEs And Government Agencies Today3
Friday, April 6, 12
State Of The Art In Industry Today
• Experimentation is increasing
• At least 25 different software vendors• $0 - $320,000 a year*
*Source: whichmvt.com
Friday, April 6, 12
Practice Has Its Own Literature
Friday, April 6, 12
Website Experiments
Several ways to conduct experiments1. Server-side / Client-side
2. A/B Test / Multivariate Test
Friday, April 6, 12
Not Overly Expensive Software
Google Website Optimizer(free)
Visual Website Optimizer($600 - $3000 / year)
Just 2 out of 25+ vendors
Friday, April 6, 12
A/B/n Experiment
WebpageA
WebpageB
Webpagen
Javascript
Independent variable
Dependent variable BehaviorBehavior Behavior
Statistical test Difference
Dependent variable Behavior
Users
Friday, April 6, 12
Google Website Optimizer
Friday, April 6, 12
Limitations Of Mainstream Experimental Software
1. Limited to between-subject design
2. Lack of data export
3. No control over statistical test
4. Expensive coding necessary
Friday, April 6, 12
Limitation 1: Limited To Between Subject Design
• Cannot control for individual differences (No such data is collected / made available)
• Requires more experimental subjects
• No pre-experimental data is collected
Friday, April 6, 12
Limitation 2: Lack of Data Export
Friday, April 6, 12
Google Website Optimizer: Data Export
Friday, April 6, 12
Visual Website Optimizer
Friday, April 6, 12
Visual Website Optimizer: Data Export
Friday, April 6, 12
Software Limitations: Data Export
• Some software better than other
• No data on individual users
• No segmentation on background variables
• This might be the biggest problem, as this is where many significant results lie.
Friday, April 6, 12
Limitation 3: No Choice Between Statistical Tests
Okay?
Friday, April 6, 12
Statistical Test = Chance To Beat Original
“The chance to beat original ... displays the probability that a combination will be more the successful than the original version.
When numbers in this column are high, perhaps around 95%, that means a given combination is probably a good candidate to replace your original content.
Low numbers in this column mean that the corresponding combination is a poor candidate for replacement.”
http://support.google.com/websiteoptimizer/bin/answer.py?hl=en&answer=55944
Friday, April 6, 12
Visual Website Optimizer Is More Transparent
“ Visual Website Optimizer uses z-tests for both A/B tests and multivariate tests”
Standard Error (SE) = Square root of (p * (1-p) / n)
http://visualwebsiteoptimizer.com/split-testing-blog/what-you-really-need-to-know-about-mathematics-of-ab-split-testing/
Friday, April 6, 12
z-tests
• Focus on a single parameter
• Assumes parametric assumptions are met
We don’t know if data fits this
Friday, April 6, 12
Limitation 4: Coding Required
WebpageA
WebpageB
Webpagen
Javascript
Independent variable
Dependent variable BehaviorBehavior Behavior
Statistical test Difference
Dependent variable
UsersHave to
be coded
Friday, April 6, 12
Software Limitations: Expensive Coding
We already coded it, so we can as well keep it. I hate working for no reason
Friday, April 6, 12
Software Limitations: Expensive Coding
I knew this wouldn’t work! We should never have spent resources on it...
Friday, April 6, 12
The Challenge
1. Overcome methodological limitations of experimental
software
2. Reduce development costs
3. Explore possibilities and limitations of online experimentation
Friday, April 6, 12
Lean Experimentation4
Friday, April 6, 12
Test Environment
ProxyA
ProxyB
Proxyn
Independent variable
Dependent variable Behavior
Statistical test Difference
Dependent variable
Behavior on website
Users
Behavior on website
Behavior on website
Friday, April 6, 12
Proxies For Experimentation
Website Email
Survey Ads
Friday, April 6, 12
Comparative Advantages And Disadvantages
Friday, April 6, 12
Lean Experimentation Principles
1.Test assumptions, ideas, and theories
2. Test before coding, not after
3. Test in the field
Friday, April 6, 12
1. Test Assumptions, Ideas, And Theories
Friday, April 6, 12
2. Test Before Coding, Not After
Bad Idea
Good Idea
Experimentation
Implementation
Ideas
Friday, April 6, 12
3. Test In The Field
• Identical design patterns have different effects in different contexts
• E.g. social comparison information in respectively competitive and cooperative communities
• Cocktail effects are largely unknown
Friday, April 6, 12
Requirements Of Lean Experimentation
1. Independent groups
2. Random assignment
3. Allows tracking
Friday, April 6, 12
Why Use Proxies For Experimentation?
Friday, April 6, 12
Test Environment
• Manipulates the independent variable through a proxy
• Examines dependent variable in natural field environment
Friday, April 6, 12
Test Subjects
• Existing users (when using website, email, and survey)
• Potential users (when using advertisements)
Friday, April 6, 12
Proposed Usage and limitations
Good for• Ideas• Theories• Hypothesis• Features
Less suited for• Small changes• Graphical changes
Can be useful if testing assumptions
Friday, April 6, 12
Data Output
• Mixed sources that need to be combined• Open / CTR rates from proxy• Web analytics• SQL databases
Friday, April 6, 12
Durability of Proxy Experiment is short
0
4
8
12
16
Wk0 Wk1 Wk2 Wk3
Control Experimentation
Email experiment
Friday, April 6, 12
Buy In Needed
1. Making changes on websites
2. Sending Emails
3. Conducting Surveys
4. Running Ads
Hard to sell
Easy to sell
Friday, April 6, 12
Feedback Quality
1. Wireframes / early stage development
2. Finished / Nearly finished stages
Critical feedback
Not so critical feedback
Friday, April 6, 12
Influence On Decisions
Increased likelihood of impact when getting experimental effect data early
Friday, April 6, 12