craig sullivan - oh boy! these a/b tests look like total bullshit! mktfest 2014

Oh Boy!

These A/B

tests look like total bullshit!

@OptimiseOrDie

@OptimiseOrDie •  UX, Analytics, Split Testing and Growth Rate Optimisation •  Started doing testing & CRO 2004

•  Split tested over 40M visitors in 19 languages •  60+ mistakes I MADE with AB testing

•  Like riding a bike…

•  Want to optimise your optimisation? Get in touch!

Top Tes'ng F***ups for 2014 1.  Tes'ng in the wrong place 2.  Your hypothesis inputs are crap 3.  No analy'cs integra'on 4.  Your test will finish a=er you die 5.  You don’t test for long enough 6.  You peek before it’s ready 7.  No QA for your split test 8.  Opportuni'es are not priori'sed 9.  Tes'ng cycles are too slow 10.  You don’t know when tests are ready 11.  Your test fails 12.  The test is ‘about the same’ 13.  Test flips behaviour 14.  Test keeps moving around 15.  You run an A/A test and waste 'me 16.  Nobody ‘feels’ the test 17.  You forgot you were responsive 18.  You forgot you had no traffic 19.  You ran the wrong test type 20.  You didn’t try all the flavours of tes'ng

@OptimiseOrDie

slidesha.re/1wBbZ9c

#fail

@OptimiseOrDie

@OptimiseOrDie

26.6M

@OptimiseOrDie 28.4M

Oppan Gangnam Style!

@OptimiseOrDie 6.9M

@OptimiseOrDie

The 95% Stopping Problem

•  Many people use 95, 99% ‘confidence’ to stop

•  This value is unreliable

•  Read this Nature article : bit.ly/1dwk0if

•  You can hit 95% early in a test

•  If you stop, it could be a false result

•  Testing Tools need to be smarter about what they imply!

•  This 95% thingy – it’s the last signal you should use to stop a test

•  Let me explain

@OptimiseOrDie

False Positives and Negatives

@OptimiseOrDie

Scenario 1 Scenario 2 Scenario 3 Scenario 4 A"er 200 observa-ons Insignificant Insignificant Significant! Significant!

A"er 500 observa-ons Insignificant Significant! Insignificant Significant!

End of experiment Insignificant Significant! Insignificant Significant!

Scenario 1 Scenario 2 Scenario 3 Scenario 4 A"er 200 observa-ons Insignificant Insignificant Significant! Significant!

A"er 500 observa-ons Insignificant Significant! trial stopped trial stopped

End of experiment Insignificant Significant! Significant! Significant!

62.5cm +/- 1cm

@OptimiseOrDie

9.1% ± 0.5

9.3% ± 0.5

9.1% ± 0.2

9.3% ± 0.2

9.1% ± 0.1

9.3% ± 0.1

A B

AB Testing Visualisation Tool

@OptimiseOrDie abtestguide.com/calc/

“You should know that stopping a test once it’s significant is deadly sin number 1 in A/B testing land.

77% of A/A tests (testing the same thing as A and B) will reach significance at a certain point.”

Ton Wesseling, Online Dialogue

“I always tell people that you need a representative sample if your data needs to be valid. What does ‘representative’ mean?

First of all you need to include all the weekdays and weekends. You need different weather, because it impacts buyer behaviour. But most important: Your traffic needs to have all traffic sources, especially newsletter, special campaigns, TV,… everything!”

Andre Morys, Web Arts

The 95% Stopping Problem

“Statistical Significance does not equal Validity”http://bit.ly/1wMfmY2

“Why every Internet Marketer should be a Statistician”http://bit.ly/1wMfs1G

“Understanding the Cycles in your site”http://mklnd.com/1pGSOUP

Three Articles you MUST read

Business & Purchase Cycles

@OptimiseOrDie

•  Customers change•  Your traffic mix changes•  Markets, competitors•  Be aware of all the waves•  Always test whole cycles•  Minimum 2 cycles (wk/mo)•  Don’t exclude slower buyers

Start Test Finish Avg Cycle

•  TWO BUSINESS CYCLES minimum (week/mo)

•  1 PURCHASE CYCLE minimum

•  250 CONVERSIONS minimum per creative (e.g. checkouts)

•  350 & MORE! if response is very similar

•  FULL WEEKS/CYCLES never part of one

•  KNOW what marketing, competitors and cycles are doing

•  RUN a test length calculator - bit.ly/XqCxuu

•  SET your test run time , RUN IT, STOP IT, ANALYSE IT

•  ONLY RUN LONGER if you need more data

•  DON’T RUN LONGER just because the test isn’t giving the result you want!

@OptimiseOrDie

How Long? Simple Rules to follow

Oops! No QA testing for the AB test!

QA Test or lose loads of MONEY!!! •  Over 40% of AB tests I’ve worked on were broken (some seriously) •  I’ve also found over £20M p.a. of browser bugs in the last 18 months •  It’s very easy to break or bias your tes'ng

Browser testing www.crossbrowsertesting.com www.browserstack.com www.spoon.net www.saucelabs.com

www.multibrowserviewer.com

Mobile devices www.appthwack.com www.deviceanywhere.com www.opendevicelab.com

Read this article bit.ly/1wBccsJ @OptimiseOrDie

Gamble the Company AWAY!

•  I get 60-65% right•  UX and Copywriters good at picking!•  C level execs are easy marks•  Ironically, many decide ‘designs’•  You need collaborative test design•  It’s a team game, with customers•  Flip a coin, anyone?

WE’RE ALL WINGING IT

2004 Headspace

What I thought I knew in 2004

Reality

2014 Headspace

What I KNOW I know

Me, on a good day

Guessaholics Anonymous

Rumsfeldian Space

The Blind Octopus

@OptimiseOrDie

Business Future Testing? Congratulations! Today you’re the lucky winner of our random awards programme. You get all these extra features for free, on us. Enjoy. Mr D. Vader

The 5 Legged Optimisation Barstool

#1 : CULTURE•  Smart Talented Polymath People•  Flexible and Agile ‘One Team’ approach•  Smash the Silos•  Proper Agile, Rapid, Iterative

@OptimiseOrDie

Fittest? Agile!

@OptimiseOrDie

#2 : Analytics Investment (TOOLS, PEOPLE, DEV TIME)

@OptimiseOrDie

#3 : Expensive and tedious UX research?

@OptimiseOrDie

#3 : Low Cost, Remote, Rapid UX research #3 : Cross Channel, Multi Device Diary Studies

“On the average, five times as many people read the headline as read the body copy. When you have written your headline, you have spent eighty cents out of your dollar.”

David Ogilvy

“In 9 years and 40M split tests with visitors, the majority of my testing success came from playing with the words.”

@OptimiseOrDie

#4 : PERSUASIVE COPYWRITING

•  Google Content Experiments bit.ly/Ljg7Ds

•  Optimizely www.optimizely.com

•  Visual Website Optimizer www.visualwebsiteoptimizer.com

•  Multi Armed Bandit Explanation bit.ly/Xa80O8

•  New Machine Learning Tools www.conductrics.com

@OptimiseOrDie

#5 : Split Testing Tools

@OptimiseOrDie

#1 Culture & Team#2 Toolkit & Analytics investment#3 UX, CX, Service Design, Insight#4 Persuasive Copywriting#5 Experimentation (testing) tools

The 5 Legged Optimisation Barstool

READ STUFF

#5 : FIND STUFF

@OptimiseOrDie

@danbarker Analytics @fastbloke Analytics @timlb Analytics @jamesgurd Analytics @therustybear Analytics @carmenmardiros Analytics @davechaffey Analytics @priteshpatel9 Analytics @cutroni Analytics @avinash Analytics @Aschottmuller Analytics, CRO @cartmetrix Analytics, CRO @Kissmetrics CRO / UX @Unbounce CRO / UX @Morys CRO / Neuro @UXFeeds UX / Neuro @Psyblog Neuro @Gfiorelli1 SEO / Analytics

@PeepLaja CRO @TheGrok CRO @UIE UX @LukeW UX / Forms @cjforms UX / Forms @axbom UX @iatv UX @Chudders Photo UX @JeffreyGroks Innovation @StephanieRieger Innovation @BrianSolis Innovation @DrEscotet Neuro @TheBrainLady Neuro @RogerDooley Neuro @Cugelman Neuro @Smashingmag Dev / UX @uxmag UX @Webtrends UX / CRO

#5 : LEARN STUFF

@OptimiseOrDie

Baymard.com Lukew.com Smashingmagazine.com ConversionXL.com Medium.com Whichtestwon.com Unbounce.com Measuringusability.com RogerDooley.com Kissmetrics.com Uxmatters.com Smartinsights.com Econsultancy.com Cutroni.com

www.GetMentalNotes.com

#12 : The Best Companies…

•  Invest con'nually in analy'cs instrumenta'on, tools, people •  Use an Agile, itera've, cross-‐silo, one team project culture •  Prefer collabora've tools to having lots of mee'ngs •  Priori'se development based on numbers and insight •  Prac'ce real con'nuous product improvement, not SLEDD*

•  Are fixing bugs, cru=, bad stuff as well as op'mising •  Source photos and content that support persuasion and

u'lity •  Have cross channel, cross device design, tes'ng and QA •  Segment their data for valuable insights, every test or change •  Con'nually reduce cycle (itera'on) 'me in their process •  Blend ‘long’ design, con'nuous improvement AND split tests •  Make op'misa'on the engine of change, not the slave of ego

* Single Large Expensive Doomed Developments

THE FUTURE OF TESTING

Thank You!

Mail : [email protected]

Deck : slideshare.com/sullivac

Linkedin : linkd.in/pvrg14

craig sullivan - oh boy! these a/b tests look like total bullshit! mktfest 2014

Business

optimiseordie ux

optimiseordie abtestguide

test run time

test isnt

split testing

testing cro

ab testing land

business future testing