big & personal: the data and the models behind netflix recommendations by xavier amatriain

Big & Personal: the data and the models behind Netflix recommendations

Outline

1. The Netflix Prize & the Recommendation Problem

2. Anatomy of Netflix Personalization3. Data & Models4. More data or better Models?

What we were interested in:■ High quality recommendations

Proxy question:■ Accuracy in predicted rating ■ Improve by 10% = $1million!

● Top 2 algorithms still in production

Results

SVD

RBM

What about the final prize ensembles?

■ Our offline studies showed they were too computationally intensive to scale

■ Expected improvement not worth the engineering effort■ Plus…. Focus had already shifted to other issues that

had more impact than rating prediction.

Change of focus

2006 2013

Anatomy of Netflix Personalization

Everything is a Recommendation

Everything is personalized

Note: Recommendations are per household, not individual user

Ranking

Top 10

Personalization awareness

Diversity

DadAll SonDaughterDad&Mom MomAll Daughter MomAll?

Support for Recommendations

Social Support

Social Recommendations

Genre rows

■ Personalized genre rows focus on user interest■ Also provide context and “evidence”■ Important for member satisfaction – moving personalized

rows to top on devices increased retention■ How are they generated?

■ Implicit: based on user’s recent plays, ratings, & other interactions

■ Explicit taste preferences ■ Hybrid:combine the above■ Also take into account:■ Freshness - has this been shown before?■ Diversity– avoid repeating tags and genres, limit number

of TV genres, etc.

Genres - personalization

■ Displayed in many different contexts■ In response to

user actions/context (search, queue add…)

■ More like… rows

Similars

Data&

Models

Big Data @Netflix ■ Almost 40M subscribers■ Ratings: 4M/day■ Searches: 3M/day■ Plays: 30M/day■ 2B hours streamed in Q4

2011■ 1B hours in June 2012■ > 4B hours in Q1 2013

Member Behavior

Geo-informationTime

Impressions

Device Info

Metadata

Social

Smart Models■ Logistic/linear regression■ Elastic nets■ SVD and other MF models■ Factorization Machines■ Restricted Boltzmann Machines■ Markov Chains■ Different clustering approaches■ LDA■ Association Rules■ Gradient Boosted Decision

Trees/Random Forests■ …

SVD

X[n x m] = U[n x r] S [ r x r] (V[m x r])T

■ X: m x n matrix (e.g., m users, n videos)■ U: m x r matrix (m users, r factors)■ S: r x r diagonal matrix (strength of each ‘factor’) (r: rank of the matrix)■ V: r x n matrix (n videos, r factor)

SVD for Rating Prediction

■ User factor vectors and item-factors vector■ Baseline (bias) (user & item deviation from average)■ Predict rating as■ SVD++ (Koren et. Al) asymmetric variation w. implicit feedback

■ Where ■ are three item factor vectors■ Users are not parametrized, but rather represented by:

■ R(u): items rated by user u■ N(u): items for which the user has given implicit preference (e.g. rated vs. not

rated)

Simon Funk’s SVD

■ One of the most interesting findings during the Netflix Prize came out of a blog post

■ Incremental, iterative, and approximate way to compute the SVD using gradient descent

Restricted Boltzmann Machines

■ Restrict the connectivity in ANN to make learning easier.■ Only one layer of hidden units.

■ Although multiple layers are possible

■ No connections between hidden units.■ Hidden units are independent given the visible

states.. ■ RBMs can be stacked to form Deep Belief

Networks (DBN) – 4th generation of ANNs

hidden

i

j

visible

RBM for the Netflix Prize

Ranking Key algorithm, sorts titles in most contexts

Ranking■ Ranking = Scoring + Sorting + Filtering

bags of movies for presentation to a user■ Goal: Find the best possible ordering of a

set of videos for a user within a specific context in real-time

■ Objective: maximize consumption■ Aspirations: Played & “enjoyed” titles have

best score■ Akin to CTR forecast for ads/search results

■ Factors■ Accuracy■ Novelty■ Diversity■ Freshness■ Scalability■ …

Example: Two features, linear model

Ranking

Ranking

Novelty

Diversity

Freshness

AccuracyScalability

Learning to rank

■ Machine learning problem: goal is to construct ranking model from training data

■ Training data can have partial order or binary judgments (relevant/not relevant).

■ Resulting order of the items typically induced from a numerical score

■ Learning to rank is a key element for personalization■ You can treat the problem as a standard supervised

classification problem

Learning to Rank Approaches

1. Pointwise■ Ranking function minimizes loss function defined on individual

relevance judgment ■ Ranking score based on regression or classification■ Ordinal regression, Logistic regression, SVM, GBDT, …

2. Pairwise■ Loss function is defined on pair-wise preferences■ Goal: minimize number of inversions in ranking■ Ranking problem is then transformed into the binary classification

problem■ RankSVM, RankBoost, RankNet, FRank…

Learning to rank - metrics

■ Quality of ranking measured using metrics as ■ Normalized Discounted Cumulative Gain■ Mean Reciprocal Rank (MRR)■ Fraction of Concordant Pairs (FCP)■ Others…

■ But, it is hard to optimize machine-learned models directly on these measures (they are not differentiable)

■ Recent research on models that directly optimize ranking measures

Learning to Rank Approaches

3. Listwisea. Indirect Loss Function

■ RankCosine: similarity between ranking list and ground truth as loss function■ ListNet: KL-divergence as loss function by defining a probability distribution■ Problem: optimization of listwise loss function may not optimize IR metrics

b. Directly optimizing IR measures (difficult since they are not differentiable)■ Directly optimize IR measures through Genetic Programming or Simulated

Annealing■ Gradient descent on smoothed version of objective function (e.g. CLiMF at

Recsys 2012 or TFMAP at SIGIR 2012)■ SVM-MAP relaxes the MAP metric by adding it to the SVM constraints■ AdaRank uses boosting to optimize NDCG

Other research questions we are interested on

● Row selection○ How to select and rank lists of “related” items imposing inter-

group diversity, avoiding duplicates...● Diversity

○ Can we increase diversity while preserving relevance in a way that we optimize user response?

● Similarity○ How to compute optimal and personalized similarity between

items by using different data that can range from play histories to item metadata

● Context-aware recommendations● Mood and session intent inference● ...

More data or better models?


Really?

Anand Rajaraman: Stanford & Senior VP at Walmart Global eCommerce (former Kosmix)

Sometimes, it’s not about more data


[Banko and Brill, 2001]

Norvig: “Google does not have better Algorithms, only more Data”

Many features/ low-bias models



Sometimes, it’s not about more data

XMore data or better models?

Data without a sound approach = noise

Conclusions

The Personalization Problem■ The Netflix Prize simplified the recommendation problem

to predicting ratings■ But…

■ User ratings are only one of the many data inputs we have■ Rating predictions are only part of our solution

■ Other algorithms such as ranking or similarity are very important■ We can reformulate the recommendation problem

■ Function to optimize: probability a user chooses something and enjoys it enough to come back to the service

More data + Better models +

More accurate metrics + Better approaches & architectures

Lots of room for improvement!

Thanks!

Xavier Amatriain (@xamat)[email protected]

We’re hiring!

big & personal: the data and the models behind netflix recommendations by xavier amatriain

Technology

ranking ranking

ranking problem

x r matrix

individual user ranking

ranking model

x r s r x r vm x r t

factor r

data models