google_controlled experimentation_panel_the hive
TRANSCRIPT
Experimentation Panel 3-20-13
(A Few) Key Lessons Learned Building LinkedIn Online Experimentation
Platform
Experimentation at LinkedIn
• Essential part of the release process• 1000s of concurrent experiments• Complex range of target populations based on
content, behavior and social graph data• Cater to a wide demographic• Large set of KPIs
The next frontier
• KPIs – Beyond CTR• Multiple objective optimization• KPIs reconciliation• User visit imbalance• Virality preserving A/B testing• Context dependent novelty effect• Explicit feedback vs. implicit feedback
Picking the right KPI can be tricky• Example: engagement measured by # comments on
posts on a blog website• KPI1 = average # comments
per user – B wins by 30% • KPI2 = ratio of active
(at least one posting) to inactive users – A wins by 30%
• How is this possible?
KPI2
KPI1
Do you want a smaller, highly engaged community, or a larger, less engaged community?
Winback campaign• Definition
– Returning to the web site at least once?– Returning to the web site with a certain level of
engagement, possible comparable, more or a bit less than before the account went dormant?
• Example: reminder email at 30 days after registration
3 24 45 66 87108
129150
171192
213234
255276
297318
3390
5001000150020002500300035004000
Registered 335 Days Ago
Occurrence
Loyalty Distribution: Time since last visit
Came back once at 30 days then went dormant
6
Suggest relevant groups … that one is more likely to participate in
Suggest skilled candidates … who will likely respond to hiring managers inquiries
TalentMatch(Top 24 matches of a posted job for sale)
Semantic + engagement objectives
Multiple competing objectives
7
TalentMatch use case• KPI: Repeated TM buyers 6m-1y window!
• Short-term proxy with predictivepower: – Optimize for InMail
response rate while controlling for booking rate and InMail sent rate
KPIs reconciliation• How do you compare apples and oranges?
– E.g. People vs. Job recommendationsswap
– X% lift in job apps vsY% drop in invitations
– Value of an invitationvs. value ofa job application?
• Long term cascading effect on a set of site-wide KPIs
User visit imbalance• Observed sample ≠ intended random sample• Consider an A/B test on the homepage lasting
L days. Your likely observed sample will have– Repeated (>> L) obs for super power users– ≈ L obs for daily users– ≈ L/7 obs for weekly users– NO obs for users coming less than every L days
• κ statistics• Random effects models
Virality preserving A/B testing
• Random sampling destroys social graph• Critical for social referrals
– ‘Warm’ recommendations– ‘Wisdom of your friends’ social proof
• Core + fringe to mimimize– WWW’11 FB, ‘12 Yahoo Group recommendations
Context dependent novelty effect• Job recommendation algorithms A/B test
– first 2 weeks: 2X long term stationary lift
• TalentMatch – no short-term novelty effect
12
Explicit feedback A/B testing• Enable you to understand usefulness of a
product/feature/algorithm with unequal depth• Text based A/B test! Sentiment analysis
• Reveal unexpected complexities• E.g. ‘Local’ means different things for different members
• Prevent misinterpretation of implicit user feedback!• Help prioritize future improvements
13
• C. Posse, 2012: A (Few) Key Lessons Learned Building Recommender Systems for Large-Scale Social Networks. Invited Talk, Industry Practice Expo, 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Beijing, China
• M. Rodriguez, C. Posse and E. Zhang. 2012. Multiple Objective Optimization in Recommendation Systems. Proceedings of the Sixth ACM Conference on Recommender Systems, pp. 11-18
• M. Amin, B. Yan, S. Sriram, A. Bhasin and C. Posse. 2012. Social Referral: Using Network Connections to Deliver Recommendations. Proceedings of the Sixth ACM Conference on Recommender Systems, pp. 273-276
• X. Amatriain, P. Castells, A. de Vries, C. Posse, 2012. Workshop on Recommendation Utility Evaluation: Beyond RMSE, Proceedings of the Sixth ACM Conference on Recommender Systems, pp. 351-352
References