n=10^9: automated experimentation at scale

N=109 Automated

Experimenta5on at Scale Wojciech Galuba Decision Tools Lead,

Facebook @wgaluba

N=109: Automated Experimentation at Scale

Wojtek Galuba (wgaluba@fb) Decision Tools Team Lead Data Science Infrastructure Facebook

History of Data Science Infra at FB •  Founded April 2012 •  A group of data scientists and software engineers •  Experienced first hand the need for better infrastructure •  Need continues to grow •  Team doubled over the past year •  Expect continued rapid growth this year

Why do we experiment?

Experimentation

Product changes

Experiment to study this

Metrics

Experiment to:

Catch problems before they arise

Experiment to:

Choose between multiple options

Experiment to:

Challenge intuitions about product

Experiment to:

Not only evaluate ideas but generate new ones

Challenges

Many experiments

• Experiments running in parallel • Modifying many different aspects of the product • Overlaps are possible and may conflict

Many metric dimensions • Different contexts of user actions • Thousands of device types • Geography • Demographics • Time • Enormous space of possible questions

Many teams • Many ways to run an experiment • Diverse audience for results • Huge set of results from every experiment • Many ways to interpret results

Experimentation at Facebook

An experiment

QuickExperiment

ly color: blue

size: medium"

color: blue"size: big"

color: green"size: medium"

QuickExperiment • Centralized experiment management • Purely config-level: no code pushes to iterate • Automatic exposure logging

PlanOut

PlanOut • Open sourced: http://facebook.github.io/planout/ • Flexible experimental design • Full, programmatic control over param values

Experiment evaluation

Exposures

Metrics

% change from control to test -1 0 1 2 -2 3 -3

99.9 % 99 % 95 % Confidence:

Assess decision risk

99.9 % 99 % 95 % Confidence:

Lessons learned

Computing answers to exponential number of possible questions

Pre-compute • low specificity • low dimensionality • long-term

Compute on-the-fly • high specificity • high dimensionality • short-term

A balancing act

Tackling many dimensions Two sets of tools

For exploration For extraction

Automated exploration

Enforce a lifecycle; In particular:

clear experiment end dates

Why lifecycle policy? • Unifies methodology across teams • Prevents tech debt buildup • Minimizes bad impact on product

Ease of rapid iteration; Safe and scientifically valid iteration

Fast, but not too fast • Novelty effect vs. top engaged users bump • Understand if waiting helps

Ensure mutual exclusion; Across platforms, features and infra

Why mutual exclusion? • Fewer experiment conflicts • Lower metrics variance

Exposure log everything • Measure effects on the exposed only • Conditioning analyses on the time since last exposure

The culture

Experimentation gives focus; But watch out for tunnel vision!

The culture

Cultivate sound practices; Safe and low-impact experimentation

The culture

Educate on data interpretation; Uniform decision-making

across teams

Understanding uncertainty

“Robust misinterpretation of confidence intervals” Rink Hoekstra et al. Psychonomic Bulletin & Review

• Only 3% of scientists got all 6 answers right...

• How do we educate the users of the tools?

The three stages of experimentation

infrastructure

Stage 1: Artisanal

Photo credit: Abhisek Sarda

Stage 2: Power tools

Stage 3: Industrialized

Photo credit: Steve Jurvetson

Conclusions

Empower, but don’t overwhelm

Conclusions

Filter and automate, but maintain broad focus

Conclusions Clean data and powerful tools are great, but

building the right experimentation culture is equally important

N=109 Automated Experimenta5on at

Scale Wojciech Galuba

Decision Tools Lead, Facebook @wgaluba

n=10^9: automated experimentation at scale

experiment diverse audience

clear experiment end

automated experimentation

culture experimentation

lowimpact experimentation

blue size

medium color

product overlaps

Technology

advanced field-scale experimentation for grain precision...

network experimentation at scale

automated analytics at scale

future media internet for large scale content...

live-fly, large-scale field experimentation for large...

an automated system for emulated network experimentation

experimentation at scale - university of california, san...

design and experimentation of a large scale distributed ......

scale model automated road vehicle

an automated system for emulated network...

developing automated scoring for large-scale assessments...

microsoft...creating initiatives and programmes for joint...

automated experimentation in social informatics

design of an automated experimentation and data processing...

large-scale automated synthesis of human functional...

automated experimentation system - malware analysis...

prototyping, implementation & large-scale-experimentation of...

heavenly hell – automated tests at scale wojciech seliga

automated coding of very large scale political event...

automated truck scale systems