1
-‐ hi, I’m corey, from etsy (@coreyloose) -‐ Marketplace where people around the world connect to buy and sell unique goods (not all that different from the art fair going on right now) -‐ We like to run a lot of a/b tests
2
-‐ This talk is 201, but here’s the quick 101
3
-‐ Have a theory on something that will make your product beJer -‐ Show it to some random of visitors (but keep it consistent) “buckeMng” -‐ Try both for a bit and see which one does beJer -‐ Not only does this test if your idea is good, it also tests your implementaMon and
all sorts of complex interacMons -‐ Would this one cause an Increased error rate in variaMon selecMon?
4
-‐ As I just explained it, A/B tesMng sounds simple + awesome -‐ And it is, but as always the devil is in the details -‐ I’m going to tell a bunch of stories of stuff that we did wrong, not to be negaMve
but it’s just more interesMng then spraying campaign around -‐ Lets start with a really common no-‐no
5
-‐ Trying one thing for a week, then trying another
6
-‐ Alluring because it doesn’t require you to have rich metric gathering or buckeMng -‐ You’re going to need some tooling -‐ We built Feature and Catapult
7
-‐ (only code in the presentaMon) -‐ Plenty of other opMons out there, but we’re happy with this -‐ Open source -‐ Easy enough that PMs can change experiment weights -‐ Uses cookie to ensure user experience stays consistent -‐ You’ll need your own logging to do analysis
8
-‐ Internal tool that does data analysis of a/b tests based on data processing from feature event logs
-‐ For this experiment: more pages but less add to cart -‐ No staMsMcal significance for conversion rate
9
-‐ A bit sobering but you goJa have a lot of traffic, or make a big change to do this
10
-‐ WriJen by an Etsy alumni -‐ To detect a small change you need a lot of Mme
11
-‐ The good news is if you can make a bigger effect, it gets much easier to detect (1% => 5%)
12
-‐ Have a hypothesis going in, no fishing (lets just pump some people full of this new chemical)
-‐ Lets get into some more interesMng failures
13
-‐ Going to tell a few stories about a first type of failure -‐ Mechanical
14
-‐ All users get bucketed but only Australian users are eligible for an experiment
15
-‐ This is what really happens, since the rest of the world isn’t eligible -‐ Going to under represent any effects
16
-‐ Need to exclude the rest
17
-‐ If your experiment causes the page to be a lot bigger, weirdness can happen -‐ Page loads slower
18
-‐ This ensures the user actually saw the page + we have access to more informaMon
19
-‐ Slow network speed on mobile -‐ The combo led to experiments being under-‐reported -‐ NoMced because experiment group would appear to have far less people in it -‐ Lesson: Watch page weight
20
-‐ We don’t support ie7 -‐ We ran an experiment once that looked like this in Ie7 -‐ Was sMll enough traffic to tank experiment -‐ Lesson: Slice by user groups in the analysis
21
-‐ (hal 9000) -‐ Ran an experiment on our acMvity feed, small % -‐ All the metrics tanked -‐ Turned out a bot we have to monitor page Mmes was bucketed in -‐ Lesson: a/b tooling ignore your bots
22
-‐ Previous stories were mechanical, but the real power of A/B tesMng is seeing how your idea interacts with the world
23
-‐ Implemented as a monolithic release -‐ A/B test kept as a hurdle at the end
24
-‐ Go check out dan mckinley’s talk
25
-‐ It failed terribly, purchases down over 20% -‐ Since we built it all at once, we had nothing to pin it on -‐ What if we had done something simple, are more items beJer? – 40 v. 80 items on
a page -‐ Lesson: test ideas in isolaMon
26
-‐ Here’s a story about an A/B test telling us something our product intuiMon didn’t -‐ Seems like an obvious, simple win -‐ Logins are way down -‐ Turns out average users use way worse passwords then employees -‐ Ended up being a no-‐go for other reasons -‐ Lesson: unintended consequences
27
-‐ You can’t measure everything that maJers -‐ Can iron out the mechanical issues -‐ Can run Mghtly scoped tests that allow you to make confident decisions -‐ What if you asked ½ of the people you met for the rest of the day for a $1 -‐ You’d end up with more money
28
-‐ That’s what you’re doing with this -‐ If you a/b test it, you’ll get more signups + probably beJer Mme-‐on-‐page -‐ Maybe a few more bounces -‐ But goodwill & brand impression is hard to measure
29
30