craig sullivan - oh boy! these a/b tests look like total bullshit! mktfest 2014
DESCRIPTION
Get videos from all our lectures - http://video.marketingfestival.cz Marketing Festival - World-Class Digital Marketing Event #mktfest Czech RepublicTRANSCRIPT
Oh Boy!
These A/B
tests look like total bullshit!
@OptimiseOrDie
@OptimiseOrDie • UX, Analytics, Split Testing and Growth Rate Optimisation • Started doing testing & CRO 2004
• Split tested over 40M visitors in 19 languages • 60+ mistakes I MADE with AB testing
• Like riding a bike…
• Want to optimise your optimisation? Get in touch!
Top Tes'ng F***ups for 2014 1. Tes'ng in the wrong place 2. Your hypothesis inputs are crap 3. No analy'cs integra'on 4. Your test will finish a=er you die 5. You don’t test for long enough 6. You peek before it’s ready 7. No QA for your split test 8. Opportuni'es are not priori'sed 9. Tes'ng cycles are too slow 10. You don’t know when tests are ready 11. Your test fails 12. The test is ‘about the same’ 13. Test flips behaviour 14. Test keeps moving around 15. You run an A/A test and waste 'me 16. Nobody ‘feels’ the test 17. You forgot you were responsive 18. You forgot you had no traffic 19. You ran the wrong test type 20. You didn’t try all the flavours of tes'ng
@OptimiseOrDie
slidesha.re/1wBbZ9c
#fail
@OptimiseOrDie
@OptimiseOrDie
26.6M
@OptimiseOrDie 28.4M
Oppan Gangnam Style!
@OptimiseOrDie 6.9M
@OptimiseOrDie
@OptimiseOrDie
@OptimiseOrDie
The 95% Stopping Problem
• Many people use 95, 99% ‘confidence’ to stop
• This value is unreliable
• Read this Nature article : bit.ly/1dwk0if
• You can hit 95% early in a test
• If you stop, it could be a false result
• Testing Tools need to be smarter about what they imply!
• This 95% thingy – it’s the last signal you should use to stop a test
• Let me explain
@OptimiseOrDie
False Positives and Negatives
@OptimiseOrDie
Scenario 1 Scenario 2 Scenario 3 Scenario 4 A"er 200 observa-ons Insignificant Insignificant Significant! Significant!
A"er 500 observa-ons Insignificant Significant! Insignificant Significant!
End of experiment Insignificant Significant! Insignificant Significant!
Scenario 1 Scenario 2 Scenario 3 Scenario 4 A"er 200 observa-ons Insignificant Insignificant Significant! Significant!
A"er 500 observa-ons Insignificant Significant! trial stopped trial stopped
End of experiment Insignificant Significant! Significant! Significant!
62.5cm +/- 1cm
@OptimiseOrDie
9.1% ± 0.5
9.3% ± 0.5
9.1% ± 0.2
9.3% ± 0.2
9.1% ± 0.1
9.3% ± 0.1
A B
AB Testing Visualisation Tool
@OptimiseOrDie abtestguide.com/calc/
“You should know that stopping a test once it’s significant is deadly sin number 1 in A/B testing land.
77% of A/A tests (testing the same thing as A and B) will reach significance at a certain point.”
Ton Wesseling, Online Dialogue
“I always tell people that you need a representative sample if your data needs to be valid. What does ‘representative’ mean?
First of all you need to include all the weekdays and weekends. You need different weather, because it impacts buyer behaviour. But most important: Your traffic needs to have all traffic sources, especially newsletter, special campaigns, TV,… everything!”
Andre Morys, Web Arts
The 95% Stopping Problem
“Statistical Significance does not equal Validity”http://bit.ly/1wMfmY2
“Why every Internet Marketer should be a Statistician”http://bit.ly/1wMfs1G
“Understanding the Cycles in your site”http://mklnd.com/1pGSOUP
Three Articles you MUST read
Business & Purchase Cycles
@OptimiseOrDie
• Customers change• Your traffic mix changes• Markets, competitors• Be aware of all the waves• Always test whole cycles• Minimum 2 cycles (wk/mo)• Don’t exclude slower buyers
Start Test Finish Avg Cycle
19
• TWO BUSINESS CYCLES minimum (week/mo)
• 1 PURCHASE CYCLE minimum
• 250 CONVERSIONS minimum per creative (e.g. checkouts)
• 350 & MORE! if response is very similar
• FULL WEEKS/CYCLES never part of one
• KNOW what marketing, competitors and cycles are doing
• RUN a test length calculator - bit.ly/XqCxuu
• SET your test run time , RUN IT, STOP IT, ANALYSE IT
• ONLY RUN LONGER if you need more data
• DON’T RUN LONGER just because the test isn’t giving the result you want!
@OptimiseOrDie
How Long? Simple Rules to follow
Oops! No QA testing for the AB test!
QA Test or lose loads of MONEY!!! • Over 40% of AB tests I’ve worked on were broken (some seriously) • I’ve also found over £20M p.a. of browser bugs in the last 18 months • It’s very easy to break or bias your tes'ng
Browser testing www.crossbrowsertesting.com www.browserstack.com www.spoon.net www.saucelabs.com
www.multibrowserviewer.com
Mobile devices www.appthwack.com www.deviceanywhere.com www.opendevicelab.com
Read this article bit.ly/1wBccsJ @OptimiseOrDie
Gamble the Company AWAY!
• I get 60-65% right• UX and Copywriters good at picking!• C level execs are easy marks• Ironically, many decide ‘designs’• You need collaborative test design• It’s a team game, with customers• Flip a coin, anyone?
WE’RE ALL WINGING IT
2004 Headspace
What I thought I knew in 2004
Reality
2014 Headspace
What I KNOW I know
Me, on a good day
Guessaholics Anonymous
Rumsfeldian Space
The Blind Octopus
@OptimiseOrDie
Business Future Testing? Congratulations! Today you’re the lucky winner of our random awards programme. You get all these extra features for free, on us. Enjoy. Mr D. Vader
The 5 Legged Optimisation Barstool
#1 : CULTURE• Smart Talented Polymath People• Flexible and Agile ‘One Team’ approach• Smash the Silos• Proper Agile, Rapid, Iterative
@OptimiseOrDie
Fittest? Agile!
@OptimiseOrDie
#2 : Analytics Investment (TOOLS, PEOPLE, DEV TIME)
@OptimiseOrDie
#3 : Expensive and tedious UX research?
@OptimiseOrDie
#3 : Low Cost, Remote, Rapid UX research #3 : Cross Channel, Multi Device Diary Studies
“On the average, five times as many people read the headline as read the body copy. When you have written your headline, you have spent eighty cents out of your dollar.”
David Ogilvy
“In 9 years and 40M split tests with visitors, the majority of my testing success came from playing with the words.”
@OptimiseOrDie
#4 : PERSUASIVE COPYWRITING
• Google Content Experiments bit.ly/Ljg7Ds
• Optimizely www.optimizely.com
• Visual Website Optimizer www.visualwebsiteoptimizer.com
• Multi Armed Bandit Explanation bit.ly/Xa80O8
• New Machine Learning Tools www.conductrics.com
@OptimiseOrDie
#5 : Split Testing Tools
@OptimiseOrDie
#1 Culture & Team#2 Toolkit & Analytics investment#3 UX, CX, Service Design, Insight#4 Persuasive Copywriting#5 Experimentation (testing) tools
The 5 Legged Optimisation Barstool
READ STUFF
READ STUFF
READ STUFF
#5 : FIND STUFF
@OptimiseOrDie
@danbarker Analytics @fastbloke Analytics @timlb Analytics @jamesgurd Analytics @therustybear Analytics @carmenmardiros Analytics @davechaffey Analytics @priteshpatel9 Analytics @cutroni Analytics @avinash Analytics @Aschottmuller Analytics, CRO @cartmetrix Analytics, CRO @Kissmetrics CRO / UX @Unbounce CRO / UX @Morys CRO / Neuro @UXFeeds UX / Neuro @Psyblog Neuro @Gfiorelli1 SEO / Analytics
@PeepLaja CRO @TheGrok CRO @UIE UX @LukeW UX / Forms @cjforms UX / Forms @axbom UX @iatv UX @Chudders Photo UX @JeffreyGroks Innovation @StephanieRieger Innovation @BrianSolis Innovation @DrEscotet Neuro @TheBrainLady Neuro @RogerDooley Neuro @Cugelman Neuro @Smashingmag Dev / UX @uxmag UX @Webtrends UX / CRO
#5 : LEARN STUFF
@OptimiseOrDie
Baymard.com Lukew.com Smashingmagazine.com ConversionXL.com Medium.com Whichtestwon.com Unbounce.com Measuringusability.com RogerDooley.com Kissmetrics.com Uxmatters.com Smartinsights.com Econsultancy.com Cutroni.com
www.GetMentalNotes.com
#12 : The Best Companies…
• Invest con'nually in analy'cs instrumenta'on, tools, people • Use an Agile, itera've, cross-‐silo, one team project culture • Prefer collabora've tools to having lots of mee'ngs • Priori'se development based on numbers and insight • Prac'ce real con'nuous product improvement, not SLEDD*
• Are fixing bugs, cru=, bad stuff as well as op'mising • Source photos and content that support persuasion and
u'lity • Have cross channel, cross device design, tes'ng and QA • Segment their data for valuable insights, every test or change • Con'nually reduce cycle (itera'on) 'me in their process • Blend ‘long’ design, con'nuous improvement AND split tests • Make op'misa'on the engine of change, not the slave of ego
* Single Large Expensive Doomed Developments
THE FUTURE OF TESTING