experimentation at scale

32
Studying Behavior at Internet Scale Andy Edmonds. Aug 14 2015

Upload: andy-edmonds

Post on 19-Aug-2015

62 views

Category:

Science


3 download

TRANSCRIPT

Studying Behavior at Internet ScaleAndy Edmonds. Aug 14 2015

Outline● About Me ● Start with an example● Awareness:

o Consumer, Business, Academic● Learnings

o Experiments & Science, Psychology● And a fresh, detailed example

About Me20 years developing internet experiences at eBay, Microsoft, smaller playersStudied in Cognitive Science/Psychology left PhD program in ‘95, went back for Masters

Example Learning: Clicking “Page 2” vs Next indicates user intent.

Making Images Larger at eBay● Images on search result page (“SERP”) increased from 160px to 220px ● Consistent results across tests in US, UK, and DE for millions of users● Traditional metrics of search clickthrough, # of products viewed, etc. had

typically negative outcomes● Actual outcome was +10s of millions of incremental revenue

E-Commerce Experimentation● Metrics

o Conversion Rateo Total Revenue (Overall Evaluation Criteria, Kohavi)

Conversion Rate * Average Order Value● Challenges

o Outliers - rare high dollar transactions valuable, but not well distributed.

o Short term vs long term valueo Durability of findings & effect sizes

Understanding BehaviorRegressing time to first click treats the new result presentation as a sort of repeated measures design.

+200 msec evaluation per result.

seconds

Model DevelopmentTask Model “Engage” Task Repetition Effect

Hypothesis

Awareness Industry

Experimentation is Big Business (and big hype)

Beyond precedents in direct mail, clinical trials...

575,000 YouTube videos?

Driven by

● E-Commerce○ Details matter and it’s easy

to get wrong

● Startups● bootstrapped knowledge,

ability to pivot

Driving Useful Cultural ChangeExperimentation can be democratizing in corporate hierarchies.

An April Fools day prank site:

Awareness Research / Academic

Research Publications● Methodology publications

dominate● Kohavi (MSFT) started

publishing 2007● CHI Workshop 2014● Google 2010 /

Facebook 2014

Information Retrieval Concentration

Much of ongoing A/B work in published research driven by search● Search is hard to evaluate● Algorithms are highly amenable to A/B

o Transparent to usero Cheap to permute

● Conferences: ACM WWW, SIGIR, KDD, CIKM

Awareness Public

Consumer: FacebookAbout a year ago,a Facebook A/B test powered publication picked up by consumermedia.

Consumer: OKCupid

Cashing in on media interest in Facebook experiment for book promotion....

LearningsExperiments & Scientific Method

Close but no CigarA/B in business is not science:● Trading velocity for accuracy is ok in some

cases● Creating a culture of testing is challenging

o Requires a common basic acumen at interpretationo User Experience & Design professionals often

under-skilled

Iterative Learning

● Low cost of experiments promotes iteration● Lack of control of online experiments

promotes discovery● Triangulation across lab-based studies,

survey methods, and analytic baselines keyMore: Designing and Deploying Online Field Experiments. Eytan Bakshy, Dean Eckles, Michael Bernstein. WWW 2014.

Interactions are Rare?Common practice is to run massively parallel experiments● Lightly segmented across user experiences

(e.g. search, registration, checkout)● Interactions are also informative!

o I prefer small factorial (2x2, 2x3, etc)

Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Proceedings 16th Conference on Knowledge Discovery and Data Mining, ACM, Washington, DC (2010), pp. 17-26 http://research.google.com/pubs/pub36500.html

LearningsHuman Psychology

Design is Hard, Intuition Flawed

Industry success rate of A/B tests, while not cleanly reported, is less than ⅓.

Causes: Technical issues, learning experiments, incorrect intuitions on functionality and design.

Change is ChallengingPractically, user change resistance is one of the biggest problems for successful internet companies evaluating new experiences.

Learnability and avoiding pro-active interference are key areas for research.

Micro-Economic TheoryKey Concepts● Cost of Action

o Perceived Costo Predicted Costo Actual Cost

● Utilityo Prediction of Utilityo Actual Utility

● Orienting Reference: Azzopardi, L. (2014). Modeling Interaction with Economic Models of Search. Proceedings of the 37th International ACM SIGIR conference on Conference on Research and Development in Information Retrieval.

A Final Example

Searchers go deep at RB

Aside: Single User visualization is very useful technique combined with large scale analytics.

Faster Search at Redbubble● 2nd and

subsequent searches from 4+ seconds to < 1o By using “partial

page updates” vs full page reloads (e.g. AJAX)

Results, two-sample t-test

Treated = users who did a search.

About 300k users per condition, 200k users treated.1 of several ongoing tests.

R-Markdown Analysis!Reproducible research, with handy embedded images in HTML.

Micro-economic Explanation?

Users click more on the last position (or row). Why? Why oh why?

The Ski JumpHypothesis: People are making a locally rational decision, or satisficing, between the last set of results and the next button.

Appendix

Useful LinksVideos

● ACM Chi Tutorial: https://www.youtube.com/watch?v=jQDnBIeoN3E● Planout (Facebook’s EXP Platform): https://www.youtube.com/watch?v=Ayd4sqPH2DE● EXP Platform at Microsoft, Kohavi et al. http://www.exp-platform.com/Pages/default.aspx

Articles● Wired Magazine 2012, The A/B Test: Inside the Technology That’s Changing the Rules of Business● Obama Multivariate Button & Video test,

https://blog.optimizely.com/2010/11/29/how-obama-raised-60-million-by-running-a-simple-experiment/Research

● Facebook’s “Experimental evidence of massive-scale emotional contagion through social networks”, http://www.pnas.org/content/111/24/8788.full.pd

● Micro-economic Behavioral Explanations, Citations of:o Azzopardi, L. (2014). Modeling Interaction with Economic Models of Search,Proceedings of the 37th International ACM SIGIR conference

on Conference on Research and Development in Information Retrieval, 2014.