google_controlled experimentation_panel_the hive

Experimentation Panel 3-20-13

(A Few) Key Lessons Learned Building LinkedIn Online Experimentation

Platform

Experimentation at LinkedIn

• Essential part of the release process• 1000s of concurrent experiments• Complex range of target populations based on

content, behavior and social graph data• Cater to a wide demographic• Large set of KPIs

The next frontier

• KPIs – Beyond CTR• Multiple objective optimization• KPIs reconciliation• User visit imbalance• Virality preserving A/B testing• Context dependent novelty effect• Explicit feedback vs. implicit feedback

Picking the right KPI can be tricky• Example: engagement measured by # comments on

posts on a blog website• KPI1 = average # comments

per user – B wins by 30% • KPI2 = ratio of active

(at least one posting) to inactive users – A wins by 30%

• How is this possible?

KPI2

KPI1

Do you want a smaller, highly engaged community, or a larger, less engaged community?

Winback campaign• Definition

– Returning to the web site at least once?– Returning to the web site with a certain level of

engagement, possible comparable, more or a bit less than before the account went dormant?

• Example: reminder email at 30 days after registration

3 24 45 66 87108

129150

171192

213234

255276

297318

3390

5001000150020002500300035004000

Registered 335 Days Ago

Occurrence

Loyalty Distribution: Time since last visit

Came back once at 30 days then went dormant

6

Suggest relevant groups … that one is more likely to participate in

Suggest skilled candidates … who will likely respond to hiring managers inquiries

TalentMatch(Top 24 matches of a posted job for sale)

Semantic + engagement objectives

Multiple competing objectives

7

TalentMatch use case• KPI: Repeated TM buyers 6m-1y window!

• Short-term proxy with predictivepower: – Optimize for InMail

response rate while controlling for booking rate and InMail sent rate

KPIs reconciliation• How do you compare apples and oranges?

– E.g. People vs. Job recommendationsswap

– X% lift in job apps vsY% drop in invitations

– Value of an invitationvs. value ofa job application?

• Long term cascading effect on a set of site-wide KPIs

User visit imbalance• Observed sample ≠ intended random sample• Consider an A/B test on the homepage lasting

L days. Your likely observed sample will have– Repeated (>> L) obs for super power users– ≈ L obs for daily users– ≈ L/7 obs for weekly users– NO obs for users coming less than every L days

• κ statistics• Random effects models

Virality preserving A/B testing

• Random sampling destroys social graph• Critical for social referrals

– ‘Warm’ recommendations– ‘Wisdom of your friends’ social proof

• Core + fringe to mimimize– WWW’11 FB, ‘12 Yahoo Group recommendations

Context dependent novelty effect• Job recommendation algorithms A/B test

– first 2 weeks: 2X long term stationary lift

• TalentMatch – no short-term novelty effect

12

Explicit feedback A/B testing• Enable you to understand usefulness of a

product/feature/algorithm with unequal depth• Text based A/B test! Sentiment analysis

• Reveal unexpected complexities• E.g. ‘Local’ means different things for different members

• Prevent misinterpretation of implicit user feedback!• Help prioritize future improvements

13

• C. Posse, 2012: A (Few) Key Lessons Learned Building Recommender Systems for Large-Scale Social Networks. Invited Talk, Industry Practice Expo, 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Beijing, China

• M. Rodriguez, C. Posse and E. Zhang. 2012. Multiple Objective Optimization in Recommendation Systems. Proceedings of the Sixth ACM Conference on Recommender Systems, pp. 11-18

• M. Amin, B. Yan, S. Sriram, A. Bhasin and C. Posse. 2012. Social Referral: Using Network Connections to Deliver Recommendations. Proceedings of the Sixth ACM Conference on Recommender Systems, pp. 273-276

• X. Amatriain, P. Castells, A. de Vries, C. Posse, 2012. Workshop on Recommendation Utility Evaluation: Beyond RMSE, Proceedings of the Sixth ACM Conference on Recommender Systems, pp. 351-352

References

google_controlled experimentation_panel_the hive

Documents

explicit feedback ab

recommender systems

ab testing random sampling

social graph critical

social graph data

weekly users

posted job

random sample