1 the quest for the optimal experiment recsys 10-06-14
TRANSCRIPT
2
‘Science & Algorithms’ at NetflixC
ausa
tion
Cor
rela
tion
Experimentation Science, methodology, and statistical analysis of experiments
Algorithm R&D Mathematical algorithms that get embedded into automated processes, such as our recommendation system
Predictive models Standalone mathematical models to support decision making (e.g. title demand prediction)
4
Netflix Experimentation: Common “Product” is a set of controlled, randomized
experiments, many running at once
Experiment in all areas
Plenty of rigor and attention around statistics, metrics, analysis
5
Netflix Experimentation: Distinctive Core to culture (not just process)
Curated approach Decisions not automated Scrutiny of each test (and by many people)
Paying customers who are always logged in
Monthly subscription Tests last several months Sampling (test allocation) of new members can take weeks or
even months
Many devices
Streaming Hours is our main engagement metric
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 500%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
Customers’ Stream Hours in the past 28 days
Can
cel R
ate
8
Probability of retaining at each future billing cycle based on streaming S hours at N days of tenure
Total hours consumed during N days of membership
Ret
enti
on
Streaming measurement: Streaming score
Much experimentation on the recommender system
Row selection
Video ranking
Video-video similarity
User-user similarity
Search recommendations
Popularity vs personalization
Diversity
Novelty/Freshness
Evidence
14
Who should Netflix sample?Geography
Global US International Region-specific
Tenure 1 month (free trial) 2-6 months 7+ months
Classes of experience with Netflix Signups who are not rejoining members Rejoining members Existing members (any tenure) Existing members who are beyond their
free trial Newly activating a device
15
Two considerations1. For whom/what do you want to optimize?
2. Who will experience the winning test experience that gets launched?
19
Current favored samples in algorithm testing Global signups who are not rejoining within a
year
Secondarily: US existing members who are beyond their free trial International (non-US) existing members who are
beyond their free trial
20
Addressing Sampling Bias Stratified sampling on attributes that are:
Correlated with core metric Independent of the test treatment
Regression tests for any systematic randomization process
Bias monitoring for each test’s sample
Large sample sizes
Re-testing
Good judgment to recognize that the “story” makes sense
21
In the words of Nate SilverOn predicting the 2008 recession in a world of noisy data anddependent variables:
Not only was Hatzius’s forecast correct, but it was also right for the right reasons,
explaining the causes of the collapse and anticipating the effects. Hatzius refers to this chain of cause and effect as a “story”…
In contrast, if you just look at the economy asa series of variables and equations without any underlying structure,
you are almost certain to mistake noise for a signal…
The Signal and the Noise: Why so Many Predictions Fail – but Some Don’t by Nate Silver
23
Short-term metrics we consider Daily cancel requests
Daily streaming hours
Daily visits
Session length
Failed sessions (no play)
“Take rates” (CTR where the clicks is to play) Page-level Row-level Title-level
24
Statistically significant differences in churn rarely stabilize until after Day 45
Test Duration Test Duration
25
Short-term metrics we consider Daily cancel requests
Daily streaming hours
Daily visits
Session length
Failed sessions (no play)
“Take rates” (CTR where the clicks is to play) Page-level Row-level Title-level
26
How well do your short-term metrics correlate with your OEC, and
how much improvement do you seein that correlation if you increase
the time interval?
30
Key Takeaways Exercise rigor in selecting the population to sample;
representative of: The population you want to optimize for The population that will receive the experience if launched
Remain open-minded about changing the target population as business shifts occur
Address bias, ongoing
Know and apply the time duration necessary for your OEC to stabilize
Additional short-term metrics need to have sufficient duration to correlate well with your OEC