big & personal: the data and the models behind netflix recommendations by xavier amatriain

45
Big & Personal: the data and the models behind Netflix recommendations

Upload: bigmine

Post on 15-Jan-2015

3.039 views

Category:

Technology


1 download

DESCRIPTION

Since the Netflix $1 million Prize, announced in 2006, our company has been known for having personalization at the core of our product. Even at that point in time, the dataset that we released was considered “large”, and we stirred innovation in the (Big) Data Mining research field. Our current product offering is now focused around instant video streaming, and our data is now many orders of magnitude larger. Not only do we have many more users in many more countries, but we also receive many more streams of data. Besides the ratings, we now also use information such as what our members play, browse, or search. In this talk I will discuss the different approaches we follow to deal with these large streams of data in order to extract information for personalizing our service. I will describe some of the machine learning models used, as well as the architectures that allow us to combine complex offline batch processes with real-time data streams.

TRANSCRIPT

Page 1: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Big & Personal: the data and the models behind Netflix recommendations

Page 2: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Outline

1. The Netflix Prize & the Recommendation Problem

2. Anatomy of Netflix Personalization3. Data & Models4. More data or better Models?

Page 3: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain
Page 4: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

What we were interested in:■ High quality recommendations

Proxy question:■ Accuracy in predicted rating ■ Improve by 10% = $1million!

● Top 2 algorithms still in production

Results

SVD

RBM

Page 5: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

What about the final prize ensembles?

■ Our offline studies showed they were too computationally intensive to scale

■ Expected improvement not worth the engineering effort■ Plus…. Focus had already shifted to other issues that

had more impact than rating prediction.

Page 6: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Change of focus

2006 2013

Page 7: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Anatomy of Netflix Personalization

Everything is a Recommendation

Page 8: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Everything is personalized

Note: Recommendations are per household, not individual user

Ranking

Page 9: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Top 10

Personalization awareness

Diversity

DadAll SonDaughterDad&Mom MomAll Daughter MomAll?

Page 10: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Support for Recommendations

Social Support

Page 11: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Social Recommendations

Page 12: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Genre rows

■ Personalized genre rows focus on user interest■ Also provide context and “evidence”■ Important for member satisfaction – moving personalized

rows to top on devices increased retention■ How are they generated?

■ Implicit: based on user’s recent plays, ratings, & other interactions

■ Explicit taste preferences ■ Hybrid:combine the above■ Also take into account:■ Freshness - has this been shown before?■ Diversity– avoid repeating tags and genres, limit number

of TV genres, etc.

Page 13: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Genres - personalization

Page 14: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

■ Displayed in many different contexts■ In response to

user actions/context (search, queue add…)

■ More like… rows

Similars

Page 15: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Data&

Models

Page 16: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Big Data @Netflix ■ Almost 40M subscribers■ Ratings: 4M/day■ Searches: 3M/day■ Plays: 30M/day■ 2B hours streamed in Q4

2011■ 1B hours in June 2012■ > 4B hours in Q1 2013

Member Behavior

Geo-informationTime

Impressions

Device Info

Metadata

Social

Page 17: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Smart Models■ Logistic/linear regression■ Elastic nets■ SVD and other MF models■ Factorization Machines■ Restricted Boltzmann Machines■ Markov Chains■ Different clustering approaches■ LDA■ Association Rules■ Gradient Boosted Decision

Trees/Random Forests■ …

Page 18: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

SVD

X[n x m] = U[n x r] S [ r x r] (V[m x r])T

■ X: m x n matrix (e.g., m users, n videos)■ U: m x r matrix (m users, r factors)■ S: r x r diagonal matrix (strength of each ‘factor’) (r: rank of the matrix)■ V: r x n matrix (n videos, r factor)

Page 19: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

SVD for Rating Prediction

■ User factor vectors and item-factors vector■ Baseline (bias) (user & item deviation from average)■ Predict rating as■ SVD++ (Koren et. Al) asymmetric variation w. implicit feedback

■ Where ■ are three item factor vectors■ Users are not parametrized, but rather represented by:

■ R(u): items rated by user u■ N(u): items for which the user has given implicit preference (e.g. rated vs. not

rated)

Page 20: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Simon Funk’s SVD

■ One of the most interesting findings during the Netflix Prize came out of a blog post

■ Incremental, iterative, and approximate way to compute the SVD using gradient descent

Page 21: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Restricted Boltzmann Machines

■ Restrict the connectivity in ANN to make learning easier.■ Only one layer of hidden units.

■ Although multiple layers are possible

■ No connections between hidden units.■ Hidden units are independent given the visible

states.. ■ RBMs can be stacked to form Deep Belief

Networks (DBN) – 4th generation of ANNs

hidden

i

j

visible

Page 22: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

RBM for the Netflix Prize

Page 23: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Ranking Key algorithm, sorts titles in most contexts

Page 24: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Ranking■ Ranking = Scoring + Sorting + Filtering

bags of movies for presentation to a user■ Goal: Find the best possible ordering of a

set of videos for a user within a specific context in real-time

■ Objective: maximize consumption■ Aspirations: Played & “enjoyed” titles have

best score■ Akin to CTR forecast for ads/search results

■ Factors■ Accuracy■ Novelty■ Diversity■ Freshness■ Scalability■ …

Page 25: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Example: Two features, linear model

Page 26: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Example: Two features, linear model

Page 27: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Ranking

Page 28: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Ranking

Page 29: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Ranking

Novelty

Diversity

Freshness

AccuracyScalability

Page 30: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Learning to rank

■ Machine learning problem: goal is to construct ranking model from training data

■ Training data can have partial order or binary judgments (relevant/not relevant).

■ Resulting order of the items typically induced from a numerical score

■ Learning to rank is a key element for personalization■ You can treat the problem as a standard supervised

classification problem

Page 31: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Learning to Rank Approaches

1. Pointwise■ Ranking function minimizes loss function defined on individual

relevance judgment ■ Ranking score based on regression or classification■ Ordinal regression, Logistic regression, SVM, GBDT, …

2. Pairwise■ Loss function is defined on pair-wise preferences■ Goal: minimize number of inversions in ranking■ Ranking problem is then transformed into the binary classification

problem■ RankSVM, RankBoost, RankNet, FRank…

Page 32: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Learning to rank - metrics

■ Quality of ranking measured using metrics as ■ Normalized Discounted Cumulative Gain■ Mean Reciprocal Rank (MRR)■ Fraction of Concordant Pairs (FCP)■ Others…

■ But, it is hard to optimize machine-learned models directly on these measures (they are not differentiable)

■ Recent research on models that directly optimize ranking measures

Page 33: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Learning to Rank Approaches

3. Listwisea. Indirect Loss Function

■ RankCosine: similarity between ranking list and ground truth as loss function■ ListNet: KL-divergence as loss function by defining a probability distribution■ Problem: optimization of listwise loss function may not optimize IR metrics

b. Directly optimizing IR measures (difficult since they are not differentiable)■ Directly optimize IR measures through Genetic Programming or Simulated

Annealing■ Gradient descent on smoothed version of objective function (e.g. CLiMF at

Recsys 2012 or TFMAP at SIGIR 2012)■ SVM-MAP relaxes the MAP metric by adding it to the SVM constraints■ AdaRank uses boosting to optimize NDCG

Page 34: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Other research questions we are interested on

● Row selection○ How to select and rank lists of “related” items imposing inter-

group diversity, avoiding duplicates...● Diversity

○ Can we increase diversity while preserving relevance in a way that we optimize user response?

● Similarity○ How to compute optimal and personalized similarity between

items by using different data that can range from play histories to item metadata

● Context-aware recommendations● Mood and session intent inference● ...

Page 35: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

More data or better models?

Page 36: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

More data or better models?

Really?

Anand Rajaraman: Stanford & Senior VP at Walmart Global eCommerce (former Kosmix)

Page 37: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Sometimes, it’s not about more data

More data or better models?

Page 38: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

[Banko and Brill, 2001]

Norvig: “Google does not have better Algorithms, only more Data”

Many features/ low-bias models

More data or better models?

Page 39: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

More data or better models?

Sometimes, it’s not about more data

Page 40: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

XMore data or better models?

Page 41: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Data without a sound approach = noise

Page 42: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Conclusions

Page 43: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

The Personalization Problem■ The Netflix Prize simplified the recommendation problem

to predicting ratings■ But…

■ User ratings are only one of the many data inputs we have■ Rating predictions are only part of our solution

■ Other algorithms such as ranking or similarity are very important■ We can reformulate the recommendation problem

■ Function to optimize: probability a user chooses something and enjoys it enough to come back to the service

Page 44: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

More data + Better models +

More accurate metrics + Better approaches & architectures

Lots of room for improvement!

Page 45: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Thanks!

Xavier Amatriain (@xamat)[email protected]

We’re hiring!