#measurecamp : 18 simple ways to f*** up your ab testing

50
18 simple ways to fuck up your AB testing 28th March 2014 @OptimiseOrDie

Upload: craig-sullivan

Post on 27-Jan-2015

105 views

Category:

Documents


3 download

DESCRIPTION

An expanded deck of the top 18 blockers to getting successful AB or Multivariate test results. In this deck, you get a complete checklist of the stuff you need to prepare, watch, launch and monitor your testing, so it gets you the *right* conclusions.

TRANSCRIPT

Page 1: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

18 simple ways to fuck up your AB

testing

28th March 2014 @OptimiseOrDie

Page 2: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

@OptimiseOrDie

• UX and Analytics (1999)

• User Centred Design (2001)

• Agile, Startups, No budget (2003)

• Funnel optimisation (2004)

• Multivariate & A/B (2005)

• Conversion Optimisation (2005)

• Persuasive Copywriting (2006)

• Joined Twitter (2007)

• Lean UX (2008)

• Holistic Optimisation (2009)

Was : Group eBusiness Manager, BelronNow : Spareroom.co.uk

Page 3: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

@OptimiseOrDie

Timeline

Tested stupid ideas, lots

Most AB or MVT tests are bullshit

Discovered AB testing

Triage, Triangulation,

Prioritisation, Maths

Zen Plumbing

AB Test Hype Cycle

Page 4: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Craig’s Cynical Quadrant

Improves revenue

Improves UX

YesNo

No

YesClient delighted (and fires you for another UX

agency)

Client fucking delighted

Client absolutely fucking furious

Client fires you (then wins an award for your

work)

Page 5: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#1 : You’re doing it in the wrong place

@OptimiseOrDie

Page 6: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#1 : You’re doing it in the wrong place

There are 4 areas a CRO expert always looks at:

1. Inbound attrition (medium, source, landing page, keyword, intent and many more…)

2. Key conversion points (product, basket, registration)3. Processes and steps (forms, logins, registration, checkout)4. Layers of engagement (search, category, product, add)

5. Use visitor flow reports for attrition – very useful.6. For key conversion points, look at loss rates &

interactions7. Processes and steps – look at funnels or make your own8. Layers and engagement – make a ring model

@OptimiseOrDie

Page 7: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Examples – Concept

Bounce

Engage

Outcome

@OptimiseOrDie

Page 8: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Examples – 16-25Railcard.co.uk

Bounce

Login to Account

Content Engage

Start Application

Type and Details

Eligibility

Photo

Complete

@OptimiseOrDie

Page 9: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Examples – Guide Dogs

Bounce

Content Engage

Donation Pathway

Donation Page

Starts process

Funnel steps

Complete

@OptimiseOrDie

Page 10: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Within a layer

Page 1

Page 2

Page 3

Page 4 Page 5

Exit

Deeper Layer

Email

LikeContact

Wishlist

Micro Conversions

@OptimiseOrDie

Page 11: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#1 : You’re doing it in the wrong place

• Get to know the flow and loss (leaks) inbound, inside and through key processes or conversion points.

• Once you know the key steps you’re losing people at and how much traffic you have – make a money model.

• Let’s say 1,000 people see the page a month. Of those, 20% (200) convert to checkout.

• Estimate the influence your test can bring. How much money or KPI improvement would a 10% lift in the checkouts deliver?

• Congratulations – you’ve now built the worlds first IT plan with a return on investment estimate attached!

• I’ll talk more about prioritising later – but a good real world analogy for you to use:

@OptimiseOrDie

Page 12: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Think like a store owner!

If you can’t refurbish the entire store, which floors or departments will you invest in optimising?

Wherever there is:

• Footfall• Low return• Opportunity

@OptimiseOrDie

Page 13: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Insight - Inputs

#FAIL

Competitor copying

GuessingDice rolling

An article the CEO

read

Competitor change

Panic

Ego

OpinionCherished

notions Marketing whims Cosmic rays

Not ‘on brand’ enough

IT inflexibility

Internal company

needs

Some dumbass

consultant

Shiny feature

blindnessKnee jerk reactons

#2 : Your hypothesis is crap!

@OptimiseOrDie

Page 14: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Insight - Inputs

Insight

Segmentation

SurveysSales and

Call Centre

Session Replay

Social analytics

Customer contact

Eye tracking

Usability testing

Forms analytics Search

analytics Voice of Customer

Market research

A/B and MVT testing

Big & unstructured

data

Web analytics

Competitor evalsCustomer

services

#2 : These are the inputs you need…

@OptimiseOrDie

Page 15: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#2 : Solutions• You need multiple tool inputs

– Tool decks are here : www.slideshare.net/sullivac

• Usability testing and User facing teams– If you’re not doing these properly, you’re hosed

• Session replay tools provide vital input– Get vital additional customer evidence

• Simple page Analytics don’t cut it– Invest in your analytics, especially event tracking

• Ego, Opinion, Cherished notions – fill gaps– Fill these vacuums with insights and data

• Champion the user– Give them a chair at every meeting

@OptimiseOrDie

Page 16: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

We believe that doing [A] for People [B] will make outcome [C] happen.

We’ll know this when we observe data [D] and obtain feedback [E]. (reverse)

@OptimiseOrDie

Page 17: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#3 : No analytics integration

• Investigating problems with tests• Segmentation of results• Tests that fail, flip or move around• Tests that don’t make sense• Broken test setups• What drives the averages you see?

@OptimiseOrDie

Page 18: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

18

A B B A

Page 19: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

These Danish porn sites are so hardcore!

We’re still waiting for

our AB tests to finish!

• Use a test length calculator like this one:• visualwebsiteoptimizer.com/ab-split-test-duration/#4 : The test will finish after you die

Page 20: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

20

#5 : You don’t test for long enough• The minimum length

– 2 business cycles (cross check)– Usually a week, 2 weeks, Month– Always test ‘whole’ not partial cycles– Be aware of multiple cycles– Don’t self stop!– PURCHASE CYCLES – KNOW THEM

• How long after that– I aim for a minimum 250 outcomes, ideally 350+ for each ‘creative’– If you test 4 recipes, that’s 1400 outcomes needed– You should have worked out how long each batch of 350 needs before you start!– 95% confidence or higher is my aim BUT BIG SECRET -> (p values are unreliable)– If you segment, you’ll need more data – It may need a bigger sample if the response rates are similar*– Use a test length calculator but be aware of BARE MINIMUM TO EXPECT– Important insider tip – watch the error bars! The +/- stuff – let’s explain

* Stats geeks know I’m glossing over something here. That test time depends on how the two experiments separate in terms of relative performance as well as how volatile the test response is. I’ll talk about this when I record this one! This is why testing similar stuff sux.

Page 21: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#5 : The tennis court– Let’s say we want to estimate, on average, what height Roger Federer

and Nadal hit the ball over the net at. So, let’s start the match:

@OptimiseOrDie

Page 22: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

First Set Federer 6-4– We start to collect values

62cm+/- 2cm

63.5cm+/- 2cm

@OptimiseOrDie

Page 23: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Second Set – Nadal 7-6– Nadal starts sending them low over the net

62cm+/- 1cm

62.5cm+/- 1cm

@OptimiseOrDie

Page 24: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Final Set Nadal 7-6– We start to collect values

61.8cm+/- .3cm

62cm+/- .3cm

Page 25: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Let’s look at this a different way

62.5cm+/- 1cm

@OptimiseOrDie

9.1% ± 0.3

9.3% ± 0.3

Page 26: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

62.5cm+/- 1cm

@OptimiseOrDie

9.1% ± 0.5

9.3% ± 0.5

9.1% ± 0.2

9.3% ± 0.2

9.1% ± 0.1

9.3% ± 0.1

Page 27: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Graph is a range, not a line:

9.1 ± 0.3%9.1 ± 0.9%9.1 ± 1.9%

Page 28: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#5 : Summary• The minimum length:

– 2 business cycles and > purchase cycle as a minimum, regardless of outcomes. Test for less and you’re cutting.

– 250+, prefer 350+ outcomes in each– Error bar separation between creatives– 95%+ confidence (unreliable)

• Pay attention to:– Time it will take for the number of ‘recipes’ in the test– The actual footfall to the test – not sitewide numbers– Test results that don’t separate – makes the test longer– This is why you need brave tests – to drive difference– The error bars – the numbers in your AB testing tool are not precise –

they’re fuzzy regions that depend on response and sample size.– Sudden changes in test performance or response– Monitor early tests like a chef! @OptimiseOrD

ie

Page 29: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

29

#6 : You peek and jump to conclusions!• Ignore the graphs. Don’t draw conclusions. Don’t dance. Calm down.• Get a feel for the test but don’t do anything yet! • Remember – in A/B - 50% of returning visitors will see a new shiny website!• Until your test has had at least 1 business cycle and 250-350 outcomes, don’t

bother even getting excited!• Watching regularly is good though. You’re looking for anything that looks really

odd – your analytics person should be checking all the figures until you’re satisfied• All tests move around or show big swings early in the testing cycle. Here is a very

high traffic site – it still takes 10 days to start settling. Lower traffic sites will stretch this period further.

Page 30: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#7 : No QA testing for the AB

test?

Page 31: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#7 - QA Test or Die!• Over 40% of tests have had QA issues.• It’s very easy to break or bias the testing

Browser testing www.crossbrowsertesting.comwww.browserstack.comwww.spoon.netwww.cloudtesting.comwww.multibrowserviewer.comwww.saucelabs.com

Mobile devices www.perfectomobile.comwww.deviceanywhere.comwww.mobilexweb.com/emulatorswww.opendevicelab.com

@OptimiseOrDie

Page 32: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#7 : What other QA testing should I do?• Cross Browser Testing• Testing from several locations (office, home, elsewhere)• Testing the IP filtering is set up• Test tags are firing correctly (analytics and the test tool)• Test as a repeat visitor and check session timeouts• Cross check figures from 2+ sources • Monitor closely from launch, recheck, watch• WATCH FOR BIAS!

@OptimiseOrDie

Page 33: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#8 : Opportunities are not prioritised

Once you have a list of potential test areas, rank them by opportunity vs. effort.

The common ranking metrics that I use include:

• Opportunity (revenue, impact)

• Dev resource• Time to market • Risk / Complexity

Make yourself a quadrant diagram and plot them

Page 34: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#9 : Your cycles are too slow

0 6 12 18

Months

Conversion

@OptimiseOrDie

Page 35: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#9 : Solutions• Give Priority Boarding for opportunities

– The best seats reserved for metric shifters

• Release more often to close the gap– More testing resource helps, analytics ‘hawk eye’

• Kaizen – continuous improvement– Others call it JFDI (just f***ing do it)

• Make changes AS WELL as tests, basically!– These small things add up

• RUSH Hair booking – Over 100 changes– No functional changes at all – 37% improvement

• Inbetween product lifecycles?– The added lift for 10 days work, worth 360k

@OptimiseOrDie

Page 36: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#9 : Make your own cycles

“Rather than try and improve one thing by 10% - which would be very, very difficult to do,

We go and find 1,000 things and improve them all by a fraction of a per cent, which is totally do-able.”

@OptimiseOrDie

Page 37: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

37

#10 : How do I know when it’s ready?

• The hallmarks of a cooked test are:– It’s done at least 1 or preferably 2+ business and at least one if

not two purchase cycles– You have at least 250-350 outcomes for each recipe– It’s not moving around hugely at creative or segment level

performance– The test results are clear – even if the precise values are not– The intervals are not overlapping (much)– If a test is still moving around, you need to investigate– Always declare on a business cycle boundary – not the middle of

a period (this introduces bias)– Don’t declare in the middle of a limited time period advertising

campaign (e.g. TV, print, online)– Always test before and after large marketing campaigns (one

week on, one week off)

Page 38: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

38

#11 : Your test fails

@OptimiseOrDie

Page 39: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#11: Your test fails• Learn from the failure! If you can’t learn from the failure, you’ve

designed a crap test. • Next time you design, imagine all your stuff failing. What would you

do? If you don’t know or you’re not sure, get it changed so that a negative becomes insightful.

• So : failure itself at a creative or variable level should tell you something.• On a failed test, always analyse the segmentation and analytics• One or more segments will be over and under• Check for varied performance• Now add the failure info to your Knowledge Base:• Look at it carefully – what does the failure tell you? Which element do

you think drove the failure?• If you know what failed (e.g. making the price bigger) then you have

very useful information• You turned the handle the wrong way• Now brainstorm a new test

@OptimiseOrDie

Page 40: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#12 : The test is ‘about the same’• Analyse the segmentation• Check the analytics and instrumentation• One or more segments may be over and under• They may be cancelling out – the average is a lie• The segment level performance will help you (beware of

small sample sizes)• If you genuinely have a test which failed to move any

segments, it’s a crap test – be bolder• This usually happens when it isn’t bold or brave enough in

shifting away from the original design, particularly on lower traffic sites

• Get testing again!

@OptimiseOrDie

Page 41: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

• There are three reasons it is moving around– Your sample size (outcomes) is still too small– The external traffic mix, customers or reaction has

suddenly changed or – Your inbound marketing driven traffic mix is

completely volatile (very rare)

• Check the sample size• Check all your marketing activity• Check the instrumentation• If no reason, check segmentation

#13 : The test keeps moving around

@OptimiseOrDie

Page 42: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

• Something like this can happen:

• Check your sample size. If it’s still small, then expect this until the test settles.

• If the test does genuinely flip – and quite severely – then something has changed with the traffic mix, the customer base or your advertising. Maybe the PPC budget ran out? Seriously!

• To analyse a flipped test, you’ll need to check your segmented data. This is why you have a split testing package AND an analytics system.

• The segmented data will help you to identify the source of the shift in response to your test. I rarely get a flipped one and it’s always something changing on me, without being told. The heartless bastards.

#14 : The test has flipped on me

Page 43: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

43

• No – and this is why:– It’s a waste of time– It’s easier to test and monitor instead– You are eating into test time– Also applies to A/A/B/B testing– A/B/A running at 25%/50%/25% is the best

• Read my post here :http://bit.ly/WcI9EZ

#15 : Should I run an A/A test first

Page 44: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#16 : Nobody feels the test

• You promised a 25% rise in checkouts - you only see 2%• Traffic, Advertising, Marketing may have changed• Check they’re using the same precise metrics• Run a calibration exercise• I often leave a 5 or 10% stub running in a test• This tracks old creative once new one goes live• If conversion is also down for that one, BINGO!• Remember – the AB test is an estimate – it doesn’t

precisely record future performance• This is why infrequent testing is bad• Always be trying a new test instead of basking in the

glory of one you ran 6 months ago. You’re only as good as your next test.

@OptimiseOrDie

Page 45: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

#17 : You forgot about Mobile & Tablet

• If you’re AB testing a responsive site, pay attention• Content will break differently on many screens• Know thy users and their devices• Use bango or google analytics to define a test list• Make sure you test mobile devices & viewports• What looks good on your desk may not be for the user• Harder to design cross device tests• You’ll need to segment mobile, tablet & desktop response

in the analytics or AB testing package• Your personal phone is not a device mix• Ask me about making your device list• Buy core devices, rent the rest from deviceanywhere.com

@OptimiseOrDie

Page 46: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

• Forget MVT or A/B/N tests – run your numbers• Test things with high impact – don’t be a wuss!• Use UX, Session Replay to aid insight• Run a task gap survey (4Q style)• Run a dropped basket survey (LF style)• Run a general survey + check social + other sites• Run sitewide tests that appear on all pages or large clusters

of pages – • UVPs (“We are a cool brand”), USPs (“Free returns!”), UCPs

(“10% off today”).• Headers, Footers, Nudge Bars, USP bars, footer changes,

Navigation, Product pages, Delivery info etc.

#18 : Oh shit – no traffic!

Page 47: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

• If small volumes, contact customers – reach out. • If data volumes aren’t there, there are still customers!• Drive design from levers you can apply – game the system• Pick clean and simple clusters of change (hypothesis driven)• Use a goal at an earlier ring stage or funnel step• Beware of using clickthroughs when attrition is high on the

other side• Try before and after testing on identical time periods

(measure in analytics model)• Be careful about small sample sizes (<100 outcomes)• Are you working automated emails?• Fix JFDI, performance and UX issues too!

#18 : Oh shit – no traffic

Page 48: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

Top F***ups for 20141. Testing in the wrong place2. Your hypothesis inputs are crap3. No analytics integration4. Your test will finish after you die5. You don’t test for long enough6. You peek before it’s ready7. No QA for your split test8. Opportunities are not prioritised9. Testing cycles are too slow10. You don’t know when tests are ready11. Your test fails12. The test is ‘about the same’13. Test flips behaviour14. Test keeps moving around15. You run an A/A test and waste time16. Nobody ‘feels’ the test17. You forgot you were responsive18. You forgot you had no traffic @OptimiseOrD

ie

Page 49: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

49

Is there a way to fix this then?Conversion Heroes!

@OptimiseOrDie

Page 50: #Measurecamp : 18 Simple Ways to F*** up Your AB Testing

50

Email

Twitter

:[email protected]

:@OptimiseOrDie

:linkd.in/pvrg14

Slides uploaded to SLIDESHARE.NET\SULLIVAC