matrix factorization

30
Matrix Factorization Bamshad Mobasher DePaul University

Upload: meagan

Post on 24-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Matrix Factorization. Bamshad Mobasher DePaul University. The $1 Million Question. Ratings Data. 17,700 movies. 480,000 users. Training Data. 100 million ratings (matrix is 99% sparse) Rating = [user, movie-id, time-stamp, rating value] Generated by users between Oct 1998 and Dec 2005 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Matrix Factorization

Matrix Factorization

Bamshad MobasherDePaul University

Page 2: Matrix Factorization

The $1 Million Question

2

Page 3: Matrix Factorization

Ratings Data

1 3 43 5 5

4 5 53 3

2 2 25

2 1 13 3

1

17,700 movies

480,000users

Page 4: Matrix Factorization

4

Training Datai 100 million ratings (matrix is 99% sparse)

i Rating = [user, movie-id, time-stamp, rating value]

i Generated by users between Oct 1998 and Dec 2005

i Users randomly chosen among set with at least 20 ratings4 Small perturbations to help with anonymity

Page 5: Matrix Factorization

Ratings Data

1 3 43 5 5

4 5 53 3

2 ? ??

2 1 ?3 ?

1

Test Data Set(most recent ratings)

480,000users

17,700 movies

Page 6: Matrix Factorization

6

Scoringi Minimize root mean square error

4 Does not necessarily correlate well with user satisfaction4 But is a widely-used well-understood quantitative measure

i RMSE Baseline Scores on Test Data4 1.054 - just predict the mean user rating for each movie4 0.953 - Netflix’s own system (Cinematch) as of 20064 0.941 - nearest-neighbor method using correlation4 0.857 - required 10% reduction to win $1 million

Mean square error = 1/|R| S (u,i) e R ( rui - rui )2

^

Page 7: Matrix Factorization

7

Matrix Factorization of Ratings Data

i Based on the idea of Latent Factor Analysis4 Identify latent (unobserved) factors that “explain” observations in the data4 In this case, observations are user ratings of movies4 The factors may represent combinations of features or characteristics of movies

and users that result in the ratings

R Q Pm

use

rs

n movies

m u

sers

n moviesf

f

~~x

rui qTi pu ~~

Page 8: Matrix Factorization

8

Matrix Factorization of Ratings Data

Figure from Koren, Bell, Volinksy, IEEE Computer, 2009

Page 9: Matrix Factorization

9

Matrix Factorization of Ratings Data

Credit: Alex Lin, Intelligent Mining

Page 10: Matrix Factorization

Predictions as Filling Missing Data

Credit: Alex Lin, Intelligent Mining

Page 11: Matrix Factorization

Learning Factor Matricesi Need to learn the feature vectors from training data

4 User feature vector: (a, b, c)4 Item feature vector (x, y, z)

i Approach: Minimize the errors on known ratings

Credit: Alex Lin, Intelligent Mining

Page 12: Matrix Factorization

Learning Factor Matrices

minq,p S (u,i) e R ( rui - qti pu )2

rui qti pu

~~

minq,p S (u,i) e R ( rui - qti pu )2 + l (|qi|2 + |pu|2

)

Add regularization

12

Page 13: Matrix Factorization

Stochastic Gradient Descent (SGD)

eui = rui - qti pu

qi qi + g ( eui pu - l qi )

pu pu + g ( eui qi - l pu )

minq,p S (u,i) e R ( rui - qti pu )2 + l (|qi|2 + |pu|2 )

regularizationgoodness of fit

Online (“stochastic”) gradient update equations:

13

Page 14: Matrix Factorization

Components of a Rating Predictoruser-movie interactionmovie biasuser bias

User-movie interactioni Characterizes the matching between

users and moviesi Attracts most research in the fieldi Benefits from algorithmic and

mathematical innovations

Baseline predictor• Separates users and movies• Often overlooked • Benefits from insights into users’

behavior• Among the main practical

contributions of the competition

14Credit: Yehuda Koren, Google, Inc.

Page 15: Matrix Factorization

Modeling Systematic Biases

rui m + bu + bi + user-movie interactions

~~overall meanrating

mean rating

for user u

mean rating

for movie i

Example:Mean rating m = 3.7

You are a critical reviewer: your ratings are 1 lower than the mean -> bu = -1

Star Wars gets a mean rating of 0.5 higher than average movie: bi = + 0.5

Predicted rating for you on Star Wars = 3.7 - 1 + 0.5 = 3.2

qti pu

15Credit: Padhraic Smyth, University of California, Irvine

Page 16: Matrix Factorization

Objective Function

minq,p { S (u,i) e R ( rui - (m + bu + bi + qti

pu ) )2

+ l (|qi|2 + |pu|2 + |bu|2 + |bi|2 ) }

regularization

goodness of fit

Typically selected via grid-search on a validation set

16Credit: Padhraic Smyth, University of California, Irvine

Page 17: Matrix Factorization

5%

8%

17Figure from Koren, Bell, Volinksy, IEEE Computer, 2009

Page 18: Matrix Factorization

18

Page 19: Matrix Factorization

Explanation for increase?

19

Page 20: Matrix Factorization

Adding Time Effectsrui m + bu + bi + user-movie interactions

~~

~~rui m + bu(t) + bi(t) + user-movie interactions

Add time dependence to biases

Time-dependence parametrized by linear trends, binning, and other methods

For details see Y. Koren, Collaborative filtering with temporal dynamics, ACM SIGKDD Conference 2009

20Credit: Padhraic Smyth, University of California, Irvine

Page 21: Matrix Factorization

Adding Time Effectsrui m + bu(t) + bi(t) + qt

i pu(t) ~~

Add time dependence to user “factor weights”

Models the fact that user’s interests over “genres” (the q’s)

may change over time

21

Page 22: Matrix Factorization

Figure from Koren, Bell, Volinksy, IEEE Computer, 2009

5%

8%

22

Page 23: Matrix Factorization

The Kitchen Sink Approach….i Many options for modeling

4 Variants of the ideas we have seen so farh Different numbers of factorsh Different ways to model timeh Different ways to handle implicit informationh ….

4 Other models (not described here)h Nearest-neighbor modelsh Restricted Boltzmann machines

i Model averaging was useful….4 Linear model combining4 Neural network combining4 Gradient boosted decision tree combining4 Note: combining weights learned on validation set (“stacking”)

23Credit: Padhraic Smyth, University of California, Irvine

Page 24: Matrix Factorization

24

Page 25: Matrix Factorization

Other Aspects of Model Buildingi Automated parameter tuning

4 Using a validation set, and grid search, various parameters such as learning rates, regularization parameters, etc., can be optimized

i Memory requirements4 Memory: can fit within roughly 1 Gbyte of RAM

i Training time 4 Order of days: but achievable on commodity hardware rather than a

supercomputer4 Some parallelization used

25Credit: Padhraic Smyth, University of California, Irvine

Page 26: Matrix Factorization

Progress Prize 2008

Sept 2nd Only 3 teams qualify for 1% improvement over previous year

Oct 2nd Leading team has 9.4% overall improvement

Progress prize ($50,000) awarded to BellKor team of 3 AT&T researchers (same as before) plus 2 Austrian graduate students, Andreas Toscher and Martin Jahrer

Key winning strategy: clever “blending” of predictions

from models used by both teams

Speculation that 10% would be attained by mid-2009

26

Page 27: Matrix Factorization

The Leading Team for the Final Prize

i BellKorPragmaticChaos4 BellKor:

h Yehuda Koren (now Yahoo!), Bob Bell, Chris Volinsky, AT&T4 BigChaos:

h Michael Jahrer, Andreas Toscher, 2 grad students from Austria4 Pragmatic Theory

h Martin Chabert, Martin Piotte, 2 engineers from Montreal (Quebec)

27

Page 28: Matrix Factorization

28

Page 29: Matrix Factorization

June 26th 2009: after 1000 days & nights…

29

Page 30: Matrix Factorization

Million Dollars Awarded Sept 21st 2009

30